hdmf.backends.hdf5.h5tools module

class hdmf.backends.hdf5.h5tools.HDF5IO(path=None, mode='r', manager=None, comm=None, file=None, driver=None, aws_region=None, herd_path=None)

Bases: HDMFIO

Open an HDF5 file for IO.

Parameters:
  • path (str or Path) – the path to the HDF5 file

  • mode (str) – the mode to open the HDF5 file with, one of (“w”, “r”, “r+”, “a”, “w-”, “x”). See h5py.File for more details.

  • manager (TypeMap or BuildManager) – the BuildManager or a TypeMap to construct a BuildManager to use for I/O

  • comm (Intracomm) – the MPI communicator to use for parallel I/O

  • file (File or S3File or RemFile) – a pre-existing h5py.File, S3File, or RemFile object

  • driver (str) – driver for h5py to use when opening HDF5 file

  • aws_region (str) – If driver is ros3, then specify the aws region of the url.

  • herd_path (str) – The path to read/write the HERD file

static can_read(path)

Determines whether a given path is readable by the HDF5IO class

property comm

The MPI communicator to use for parallel I/O.

property driver
property aws_region
classmethod load_namespaces(namespace_catalog, path=None, namespaces=None, file=None, driver=None, aws_region=None)

Load cached namespaces from a file into the provided NamespaceCatalog or TypeMap.

If file is not supplied, then an h5py.File object will be opened for the given path, the namespaces will be read, and the File object will be closed. If file is supplied, then the given File object will be read from and not closed.

Raises:

ValueError – if both path and file are supplied but path is not the same as the path of file

Parameters:
  • namespace_catalog (NamespaceCatalog or TypeMap) – the NamespaceCatalog or TypeMap to load namespaces into

  • path (str or Path) – the path to the HDF5 file

  • namespaces (list) – the namespaces to load

  • file (File) – a pre-existing h5py.File object

  • driver (str) – driver for h5py to use when opening HDF5 file

  • aws_region (str) – If driver is ros3, then specify the aws region of the url.

Returns:

dict mapping the names of the loaded namespaces to a dict mapping included namespace names and the included data types

Return type:

dict

load_namespaces_io(namespace_catalog, namespaces=None)

Load cached namespaces from this HDF5IO object into the provided NamespaceCatalog or TypeMap.

Parameters:
  • namespace_catalog (NamespaceCatalog or TypeMap) – the NamespaceCatalog or TypeMap to load namespaces into

  • namespaces (list) – the namespaces to load

classmethod get_namespaces(path=None, file=None, driver=None, aws_region=None)

Get the names and versions of the cached namespaces from a file.

If file is not supplied, then an h5py.File object will be opened for the given path, the namespaces will be read, and the File object will be closed. If file is supplied, then the given File object will be read from and not closed.

If there are multiple versions of a namespace cached in the file, then only the latest one (using alphanumeric ordering) is returned. This is the version of the namespace that is loaded by HDF5IO.load_namespaces(…).

Raises:

ValueError – if both path and file are supplied but path is not the same as the path of file.

Parameters:
  • path (str or Path) – the path to the HDF5 file

  • file (File) – a pre-existing h5py.File object

  • driver (str) – driver for h5py to use when opening HDF5 file

  • aws_region (str) – If driver is ros3, then specify the aws region of the url.

Returns:

dict mapping names to versions of the namespaces in the file

Return type:

dict

write(container, cache_spec=True, link_data=True, exhaust_dci=True, herd=None, expandable=True)

Write the container to an HDF5 file.

Parameters:
  • container (Container) – the Container object to write

  • cache_spec (bool) – If True (default), cache specification to file (highly recommended). If False, do not cache specification to file. The appropriate specification will then need to be loaded prior to reading the file.

  • link_data (bool) – If True (default), create external links to HDF5 Datasets. If False, copy HDF5 Datasets.

  • exhaust_dci (bool) – If True (default), exhaust DataChunkIterators one at a time. If False, exhaust them concurrently.

  • herd (HERD) – A HERD object to populate with references.

  • expandable (bool) – If True (default), datasets will be created as expandable by setting the maxshape based on the matching shape defined in the spec.

export(src_io, container=None, write_args=None, cache_spec=True)

Export data read from a file from any backend to HDF5.

See hdmf.backends.io.HDMFIO.export for more details.

Parameters:
  • src_io (HDMFIO) – the HDMFIO object for reading the data to export

  • container (Container) – the Container object to export. If None, then the entire contents of the HDMFIO object will be exported

  • write_args (dict) – arguments to pass to write_builder

  • cache_spec (bool) – whether to cache the specification to file

classmethod export_io(path, src_io, comm=None, container=None, write_args=None, cache_spec=True)

Export from one backend to HDF5 (class method).

Convenience function for export where you do not need to instantiate a new HDF5IO object for writing. An HDF5IO object is created with mode ‘w’ and the given arguments.

Example usage:

old_io = HDF5IO('old.h5', 'r')
HDF5IO.export_io(path='new_copy.h5', src_io=old_io)

See export for more details.

Parameters:
  • path (str) – the path to the destination HDF5 file

  • src_io (HDMFIO) – the HDMFIO object for reading the data to export

  • comm (Intracomm) – the MPI communicator to use for parallel I/O

  • container (Container) – the Container object to export. If None, then the entire contents of the HDMFIO object will be exported

  • write_args (dict) – arguments to pass to write_builder

  • cache_spec (bool) – whether to cache the specification to file

read()

Read a container from the IO source.

Returns:

the Container object that was read in

Return type:

Container

read_builder()

Read data and return the GroupBuilder representing it.

NOTE: On read, the Builder.source may will usually not be set of the Builders. NOTE: The Builder.location is used internally to ensure correct handling of links (in particular on export) and should be set on read for all GroupBuilder, DatasetBuilder, and LinkBuilder objects.

Returns:

a GroupBuilder representing the data object

Return type:

GroupBuilder

get_written(builder)

Return True if this builder has been written to (or read from) disk by this IO object, False otherwise.

Parameters:

builder (Builder) – Builder object to get the written flag for

Returns:

True if the builder is found in self._written_builders using the builder ID, False otherwise

get_builder(h5obj)

Get the builder for the corresponding h5py Group or Dataset

Raises:

ValueError – When no builder has been constructed yet for the given h5py object

Parameters:

h5obj (Dataset or Group) – the HDF5 object to the corresponding Builder object for

get_container(h5obj)

Get the container for the corresponding h5py Group or Dataset

Raises:

ValueError – When no builder has been constructed yet for the given h5py object

Parameters:

h5obj (Dataset or Group) – the HDF5 object to the corresponding Container/Data object for

open()

Open this HDMFIO object for writing of the builder

close(close_links=True)

Close this file and any files linked to from this file.

Parameters:

close_links (bool) – Whether to close all files linked to from this file. (default: True)

is_open() bool

Check whether this HDF5IO object is open for reading/writing.

close_linked_files()

Close all opened, linked-to files.

MacOS and Linux automatically release the linked-to file after the linking file is closed, but Windows does not, which prevents the linked-to file from being deleted or truncated. Use this method to close all opened, linked-to files.

write_builder(builder, link_data=True, exhaust_dci=True, export_source=None, expandable=True)
Parameters:
  • builder (GroupBuilder) – the GroupBuilder object representing the HDF5 file

  • link_data (bool) – If not specified otherwise link (True) or copy (False) HDF5 Datasets

  • exhaust_dci (bool) – exhaust DataChunkIterators one at a time. If False, exhaust them concurrently

  • export_source (str) – The source of the builders when exporting

  • expandable (bool) – If True (default), datasets will be created as expandable by setting the maxshape based on the matching shape defined in the spec.

classmethod get_type(data)
set_attributes(obj, attributes)
Parameters:
  • obj (Group or Dataset) – the HDF5 object to add attributes to

  • attributes (dict) – a dict containing the attributes on the Group or Dataset, indexed by attribute name

write_group(parent, builder, link_data=True, exhaust_dci=True, export_source=None, expandable=True)
Parameters:
  • parent (Group) – the parent HDF5 object

  • builder (GroupBuilder) – the GroupBuilder to write

  • link_data (bool) – If not specified otherwise link (True) or copy (False) HDF5 Datasets

  • exhaust_dci (bool) – exhaust DataChunkIterators one at a time. If False, exhaust them concurrently

  • export_source (str) – The source of the builders when exporting

  • expandable (bool) – If True (default), datasets will be created as expandable by setting the maxshape based on the matching shape defined in the spec.

Returns:

the Group that was created

Return type:

Group

Parameters:
  • parent (Group) – the parent HDF5 object

  • builder (LinkBuilder) – the LinkBuilder to write

  • export_source (str) – The source of the builders when exporting

Returns:

the Link that was created

Return type:

SoftLink or ExternalLink

write_dataset(parent, builder, link_data=True, exhaust_dci=True, export_source=None, expandable=True)

Write a dataset to HDF5

The function uses other dataset-dependent write functions, e.g, __scalar_fill__, __list_fill__, and __setup_chunked_dset__ to write the data.

Parameters:
  • parent (Group) – the parent HDF5 object

  • builder (DatasetBuilder) – the DatasetBuilder to write

  • link_data (bool) – If not specified otherwise link (True) or copy (False) HDF5 Datasets

  • exhaust_dci (bool) – exhaust DataChunkIterators one at a time. If False, exhaust them concurrently

  • export_source (str) – The source of the builders when exporting

  • expandable (bool) – If True (default), datasets will be created as expandable by setting the maxshape based on the matching shape defined in the spec.

Returns:

the Dataset that was created

Return type:

Dataset

property mode

Return the HDF5 file mode. One of (“w”, “r”, “r+”, “a”, “w-”, “x”).

classmethod set_dataio(data=None, maxshape=None, chunks=None, compression=None, compression_opts=None, fillvalue=None, shuffle=None, fletcher32=None, link_data=False, allow_plugin_filters=False, shape=None, dtype=None)

Wrap the given Data object with an H5DataIO.

This method is provided merely for convenience. It is the equivalent of the following:

from hdmf.backends.hdf5 import H5DataIO
data = ...
data = H5DataIO(data)
Parameters:
  • data (ndarray or list or tuple or Dataset or Iterable) – the data to be written. NOTE: If an h5py.Dataset is used, all other settings but link_data will be ignored as the dataset will either be linked to or copied as is in H5DataIO.

  • maxshape (tuple) – Dataset will be resizable up to this shape (Tuple). Automatically enables chunking.Use None for the axes you want to be unlimited.

  • chunks (bool or tuple) – Chunk shape or True to enable auto-chunking

  • compression (str or bool or int) – Compression strategy. If a bool is given, then gzip compression will be used by default.http://docs.h5py.org/en/latest/high/dataset.html#dataset-compression

  • compression_opts (int or tuple) – Parameter for compression filter

  • fillvalue (None) – Value to be returned when reading uninitialized parts of the dataset

  • shuffle (bool) – Enable shuffle I/O filter. http://docs.h5py.org/en/latest/high/dataset.html#dataset-shuffle

  • fletcher32 (bool) – Enable fletcher32 checksum. http://docs.h5py.org/en/latest/high/dataset.html#dataset-fletcher32

  • link_data (bool) – If data is an h5py.Dataset should it be linked to or copied. NOTE: This parameter is only allowed if data is an h5py.Dataset

  • allow_plugin_filters (bool) – Enable passing dynamically loaded filters as compression parameter

  • shape (tuple) – the shape of the new dataset, used only if data is None

  • dtype (str or type or dtype) – the data type of the new dataset, used only if data is None

static generate_dataset_html(dataset)

Generates an html representation for a dataset for the HDF5IO class