LArPix+HDF5 Format

This module gives access to the LArPix+HDF5 file format.

File format description

All LArPix+HDF5 files use the HDF5 format so that they can be read and written using any language that has an HDF5 binding. The documentation for the Python h5py binding is at <http://docs.h5py.org>.

The to_file and from_file methods translate between a list of Packet-like objects and an HDF5 data file. from_file can be used to load up the full file all at once or just a subset of rows (supposing the full file was too big to fit in memory). To access the data most efficiently, do not rely on from_file and instead perform analysis directly on the HDF5 data file.

File Header

The file header can be found in the /_header HDF5 group. At a minimum, the header will contain the following HDF5 attributes:

  • version: a string containing the LArPix+HDF5 version
  • created: a Unix timestamp of the file’s creation time
  • modified: a Unix timestamp of the file’s last-modified time

Versions

The LArPix+HDF5 format is self-describing and versioned. This means as the format evolves, the files themselves will identify which version of the format should be used to interpret them. When writing a file with to_file, the format version can be specified, or by default, the latest version is used. When reading a file with from_file, by default, the format version of the actual file is used. If a specific format version is expected or required, that version can be specified, and a RuntimeError will be raised if a different format version is encountered.

The versions are always in the format major.minor and are stored as strings (e.g. '1.0', '1.5', 2.0).

The minor format will increase if a non-breaking change is made, so that a script compatible with a lower minor version will also work with files that have a higher minor version. E.g. a script designed to work with v1.0 will also work with v1.5. The reverse is not necessarily true: a script designed to work with v1.5 may not work with v1.0 files.

The major format will increase if a breaking change is made. This means that a script designed to work with v1.5 will likely not work with v2.0 files, and vice versa.

File Data

The file data is saved in HDF5 datasets, and the specific data format depends on the LArPix+HDF5 version.

Version 2.4 description

For version 2.4, chip configuration objects can be saved to the 'configs' dataset. For compatibility reasons, only 1 type of asic configuration can be stored per hdf5 file.

The configs dataset contains a timestamped entry for each chip config that has been logged

  • Shape: (N,), N >= 0

  • Attrs: asic_version (U25/unicode string): a global asic version to use with this dataset, depending on the asic version a different length datatype is used.

  • Datatype: a compound datatype (called “structured type” in h5py/numpy). Keys/fields:

    • timestamp (u8/unsigned long): a DAQ-system unix timestamp associated with when the config was written to the file
    • io_group (u1/unsigned byte): an id associated with the high-level io group associated with this network node
    • io_channel (u1/unsigned byte): the id associated with the mid-level io channel associated with this network node
    • chip_id (u1/unsigned byte): the id associated with the low-level asic
    • registers ((239,)u1: unsigned byte): the value at each of the asic’s register addresses

Version 2.3 description

For version 2.3, the receipt_timestamp (u4/unsigned int) field has been added to the packets dataset. Additionally, “empty” fields for data/config write/config read/test packets are now filled according to the bit content of the packet. E.g. a row representing a config write packet will still fill the dataword column as though the packet was a data packet. Finally, there are some moderate performance improvements.

Version 2.2 description

For version 2.2, two new packet types have been introduced to store data contained in SyncPacket and TriggerPacket, with type being 6 and 7 respectively.

SyncPacket will fill the timestamp field with the 32-bit timestamp associated with the sync packet, the dataword field with the value of clk_source (if applicable), ant the trigger_type field with the sync type (an unsigned byte).

TriggerPacket will fill the timestamp field with the 32-bit timestamp associated with the trigger packet and the trigger_type field with the trigger bits (an unsigned byte).

Version 2.1 description

For version 2.1, there are two dataset: packets and messages.

The packets dataset contains a list of all of the packets sent and received during a particular time interval.

  • Shape: (N,), N >= 0

  • Datatype: a compound datatype (called “structured type” in h5py/numpy). Not all fields are relevant for each packet. Unused fields are set to a default value of 0 or the empty string. Keys/fields:

    • io_group (u1/unsigned byte): an id associated with the high-level io group associated with this packet
    • io_channel (u1/unsigned byte): the id associated with the mid-level io channel associated with this packet
    • packet_type (u1/unsigned byte): the packet type code, which can be interpreted according to the map stored in the ‘packets’ attribute ‘packet_types’
    • chip_id (u1/unsigned byte): the LArPix chip id
    • parity (u1/unsigned byte): the packet parity bit (0 or 1)
    • valid_parity (u1/unsigned byte): 1 if the packet parity is valid (odd), 0 if it is invalid
    • downstream_marker (u1/unsigned byte): a marker to indicate the hydra io network direction for this packet
    • channel_id (u1/unsigned byte): the ASIC channel
    • timestamp (u8/unsigned 8-byte long int): the timestamp associated with the packet. Caution: this field does “triple duty” as both the ASIC timestamp in data packets (type == 0), as the global timestamp in timestamp packets (type == 4), and as the message timestamp in message packets (type == 5).
    • first_packet (u1/unsigned byte): indicates if this is the packet recieved in a trigger burst (v2.1 or newer only)
    • dataword (u1/unsigned byte): the ADC data word
    • trigger_type (u1/unsigned byte): the trigger type assciated with this packet
    • local_fifo` (``u1/unsigned byte): 1 if the channel FIFO is >50% full, 3 if the channel FIFO is 100% full
    • shared_fifo (u1/unsigned byte): 1 if the chip FIFO is >50% full, 3 if the channel FIFO is 100% full
    • register_address (u1/unsigned byte): the configuration register index
    • register_data (u1/unsigned byte): the configuration register value
    • direction (u1/unsigned byte): 0 if packet was sent to ASICs, 1 if packet was received from ASICs.
    • local_fifo_events (u1/unsigned byte): number of packets in the channel FIFO (only valid if FIFO diagnostics are enabled)
    • shared_fifo_events (u2/unsigned byte): number of packets in the chip FIFO (only valid if FIFO diagnostics are enabled)
    • counter (u4/unsigned 4-byte int): the message index (only valid for message type packets)
    • fifo_diagnostics_enabled (u1/unsigned byte): flag for when fifo diagnostics are enabled (1 if enabled, 0 if not)
  • Packet types lookup: the packets dataset has an attribute 'packet_types' which contains the following lookup table for packets:

    0: 'data',
    1: 'test',
    2: 'config write',
    3: 'config read',
    4: 'timestamp',
    5: 'message',
    

The messages dataset has the full messages referred to by message packets in the packets dataset.

  • Shape: (N,), N >= 0

  • Datatype: a compound datatype with fields:

    • message (S64/64-character string): the message
    • timestamp (u8/unsigned 8-byte long int): the timestamp associated with the message
    • index (u4/unsigned 4-byte int): the message index, which should be equal to the row index in the messages dataset

Version 1.0 description

For version 1.0, there are two dataset: packets and messages.

The packets dataset contains a list of all of the packets sent and received during a particular time interval.

  • Shape: (N,), N >= 0

  • Datatype: a compound datatype (called “structured type” in h5py/numpy). Not all fields are relevant for each packet. Unused fields are set to a default value of 0 or the empty string. Keys/fields:

    • chip_key (S32/32-character string): the chip key identifying the ASIC associated with this packet
    • type (u1/unsigned byte): the packet type code, which can be interpreted according to the map stored in the raw_packet attribute ‘packet_types’
    • chipid (u1/unsigned byte): the LArPix chipid
    • parity (u1/unsigned byte): the packet parity bit (0 or 1)
    • valid_parity (u1/unsigned byte): 1 if the packet parity is valid (odd), 0 if it is invalid
    • channel (u1/unsigned byte): the ASIC channel
    • timestamp (u8/unsigned 8-byte long int): the timestamp associated with the packet. Caution: this field does “triple duty” as both the ASIC timestamp in data packets (type == 0), as the global timestamp in timestamp packets (type == 4), and as the message timestamp in message packets (type == 5).
    • adc_counts (u1/unsigned byte): the ADC data word
    • fifo_half (u1/unsigned byte): 1 if the FIFO half full flag is present, 0 otherwise.
    • fifo_full (u1/unsigned byte): 1 if the FIFO full flag is present, 0 otherwise.
    • register (u1/unsigned byte): the configuration register index
    • value (u1/unsigned byte): the configuration register value
    • counter (u4/unsigned 4-byte int): the test counter value, or the message index. Caution: this field does “double duty” as the counter for test packets (type == 1) and as the message index for message packets (type == 5).
    • direction (u1/unsigned byte): 0 if packet was sent to ASICs, 1 if packet was received from ASICs.
  • Packet types lookup: the packets dataset has an attribute 'packet_types' which contains the following lookup table for packets:

    0: 'data',
    1: 'test',
    2: 'config write',
    3: 'config read',
    4: 'timestamp',
    5: 'message',
    

The messages dataset has the full messages referred to by message packets in the packets dataset.

  • Shape: (N,), N >= 0

  • Datatype: a compound datatype with fields:

    • message (S64/64-character string): the message
    • timestamp (u8/unsigned 8-byte long int): the timestamp associated with the message
    • index (u4/unsigned 4-byte int): the message index, which should be equal to the row index in the messages dataset

Examples

Plot a histogram of ADC counts (selecting packet type to be data packets only)

>>> import matplotlib.pyplot as plt
>>> import h5py
>>> f = h5py.File('output.h5', 'r')
>>> packets = f['packets']
>>> plt.hist(packets['adc_counts'][packets['type'] == 0])
>>> plt.show()

Load the first 10 packets in a file into Packet objects and print any MessagePacket packets to the console

>>> from larpix.format.hdf5format import from_file
>>> from larpix.larpix import MessagePacket
>>> result = from_file('output.h5', end=10)
>>> for packet in result['packets']:
...     if isinstance(packet, MessagePacket):
...         print(packet)
larpix.format.hdf5format.latest_version = '2.4'

The most recent / up-to-date LArPix+HDF5 format version

larpix.format.hdf5format.dtypes

The dtype specification used in the HDF5 files.

Structure: {version: {dset_name: [structured dtype fields]}}

larpix.format.hdf5format.dtype_property_index_lookup

A map between attribute name and “column index” in the structured dtypes.

Structure: {version: {dset_name: {field_name: index}}}

larpix.format.hdf5format.to_file(filename, packet_list=None, chip_list=None, mode='a', version=None, workers=None)[source]

Save the given packets to the given file.

This method can be used to update an existing file.

Parameters:
  • filename – the name of the file to save to
  • packet_list – any iterable of objects of type Packet, TimestampPacket, SyncPacket, or TriggerPacket.
  • chip_list – any iterable of objects of type Chip.
  • mode – optional, the “file mode” to open the data file (default: 'a')
  • version – optional, the LArPix+HDF5 format version to use. If writing a new file and version is unspecified or None, the latest version will be used. If writing an existing file and version is unspecified or None, the existing file’s version will be used. If writing an existing file and version is specified and does not exactly match the existing file’s version, a RuntimeError will be raised. (default: None)
larpix.format.hdf5format.from_file(filename, version=None, start=None, end=None, load_configs=None)[source]

Read the data from the given file into LArPix Packet objects.

Parameters:
  • filename – the name of the file to read
  • version – the format version. Specify this parameter to enforce a version check. When a specific version such as '1.5' is specified, a RuntimeError will be raised if the stored format version number is not an exact match. If a version is prefixed with '~' such as '~1.5', a RuntimeError will be raised if the stored format version is incompatible with the specified version. Compatible versions are those with the same major version and at least the same minor version. E.g. for '~1.5', versions between v1.5 and v2.0 are compatible. If unspecified or None, will use the stored format version.
  • start – the index of the first row to read
  • end – the index after the last row to read (same semantics as Python range)
  • load_configs – a flag to indicate if configs should be fetched from file, a value of True will load all configs and a value of type slice will load the specified subset.
Returns packet_dict:
 

a dict with keys 'packets' containing a list of packet objects; 'configs' containing a list of chip objects; and 'created', 'modified', and 'version', containing the file metadata.