hdmf.backends.hdf5.h5tools module
- class hdmf.backends.hdf5.h5tools.HDF5IO(path=None, mode='r', manager=None, comm=None, file=None, driver=None, aws_region=None, herd_path=None)
Bases:
HDMFIOOpen an HDF5 file for IO.
- Parameters:
mode (
str) – the mode to open the HDF5 file with, one of (“w”, “r”, “r+”, “a”, “w-”, “x”). See h5py.File for more details.manager (
TypeMaporBuildManager) – the BuildManager or a TypeMap to construct a BuildManager to use for I/Ocomm (
Intracomm) – the MPI communicator to use for parallel I/Ofile (
FileorS3FileorRemFile) – a pre-existing h5py.File, S3File, or RemFile objectdriver (
str) – driver for h5py to use when opening HDF5 fileaws_region (
str) – If driver is ros3, then specify the aws region of the url.herd_path (
str) – The path to read/write the HERD file
- static can_read(path)
Determines whether a given path is readable by the HDF5IO class
- property comm
The MPI communicator to use for parallel I/O.
- property driver
- property aws_region
- classmethod load_namespaces(namespace_catalog, path=None, namespaces=None, file=None, driver=None, aws_region=None)
Load cached namespaces from a file into the provided NamespaceCatalog or TypeMap.
If file is not supplied, then an
h5py.Fileobject will be opened for the given path, the namespaces will be read, and the File object will be closed. If file is supplied, then the given File object will be read from and not closed.- Raises:
ValueError – if both path and file are supplied but path is not the same as the path of file
- Parameters:
namespace_catalog (
NamespaceCatalogorTypeMap) – the NamespaceCatalog or TypeMap to load namespaces intonamespaces (
list) – the namespaces to loadfile (
File) – a pre-existing h5py.File objectdriver (
str) – driver for h5py to use when opening HDF5 fileaws_region (
str) – If driver is ros3, then specify the aws region of the url.
- Returns:
dict mapping the names of the loaded namespaces to a dict mapping included namespace names and the included data types
- Return type:
- load_namespaces_io(namespace_catalog, namespaces=None)
Load cached namespaces from this HDF5IO object into the provided NamespaceCatalog or TypeMap.
- Parameters:
namespace_catalog (
NamespaceCatalogorTypeMap) – the NamespaceCatalog or TypeMap to load namespaces intonamespaces (
list) – the namespaces to load
- classmethod get_namespaces(path=None, file=None, driver=None, aws_region=None)
Get the names and versions of the cached namespaces from a file.
If
fileis not supplied, then anh5py.Fileobject will be opened for the givenpath, the namespaces will be read, and the File object will be closed. If file is supplied, then the given File object will be read from and not closed.If there are multiple versions of a namespace cached in the file, then only the latest one (using alphanumeric ordering) is returned. This is the version of the namespace that is loaded by HDF5IO.load_namespaces(…).
- Raises:
ValueError – if both path and file are supplied but path is not the same as the path of file.
- Parameters:
- Returns:
dict mapping names to versions of the namespaces in the file
- Return type:
- write(container, cache_spec=True, link_data=True, exhaust_dci=True, herd=None, expandable=('VectorData', 'ElementIdentifiers'))
Write the container to an HDF5 file.
- Parameters:
container (
Container) – the Container object to writecache_spec (
bool) – If True (default), cache specification to file (highly recommended). If False, do not cache specification to file. The appropriate specification will then need to be loaded prior to reading the file.link_data (
bool) – If True (default), create external links to HDF5 Datasets. If False, copy HDF5 Datasets.exhaust_dci (
bool) – If True (default), exhaust DataChunkIterators one at a time. If False, exhaust them concurrently.herd (
HERD) – A HERD object to populate with references.expandable (
listortuple) – A list of data type names whose datasets (and subclasses) will be created as expandable — maxshape is set based on the matching shape defined in the spec. Default is (“VectorData”, “ElementIdentifiers”), so only DynamicTable columns and id are expandable. Pass an empty list/tuple to disable automatic expansion entirely.
- export(src_io, container=None, write_args=None, cache_spec=True)
Export data read from a file from any backend to HDF5.
See
hdmf.backends.io.HDMFIO.exportfor more details.- Parameters:
src_io (
HDMFIO) – the HDMFIO object for reading the data to exportcontainer (
Container) – the Container object to export. If None, then the entire contents of the HDMFIO object will be exportedwrite_args (
dict) – arguments to pass towrite_buildercache_spec (
bool) – whether to cache the specification to file
- classmethod export_io(path, src_io, comm=None, container=None, write_args=None, cache_spec=True)
Export from one backend to HDF5 (class method).
Convenience function for
exportwhere you do not need to instantiate a newHDF5IOobject for writing. AnHDF5IOobject is created with mode ‘w’ and the given arguments.Example usage:
old_io = HDF5IO('old.h5', 'r') HDF5IO.export_io(path='new_copy.h5', src_io=old_io)
See
exportfor more details.- Parameters:
path (
str) – the path to the destination HDF5 filesrc_io (
HDMFIO) – the HDMFIO object for reading the data to exportcomm (
Intracomm) – the MPI communicator to use for parallel I/Ocontainer (
Container) – the Container object to export. If None, then the entire contents of the HDMFIO object will be exportedwrite_args (
dict) – arguments to pass towrite_buildercache_spec (
bool) – whether to cache the specification to file
- read()
Read a container from the IO source.
- Returns:
the Container object that was read in
- Return type:
- read_builder()
Read data and return the GroupBuilder representing it.
NOTE: On read, the Builder.source may will usually not be set of the Builders. NOTE: The Builder.location is used internally to ensure correct handling of links (in particular on export) and should be set on read for all GroupBuilder, DatasetBuilder, and LinkBuilder objects.
- Returns:
a GroupBuilder representing the data object
- Return type:
- get_written(builder)
Return True if this builder has been written to (or read from) disk by this IO object, False otherwise.
- Parameters:
builder (Builder) – Builder object to get the written flag for
- Returns:
True if the builder is found in self._written_builders using the builder ID, False otherwise
- get_builder(h5obj)
Get the builder for the corresponding h5py Group or Dataset
- Raises:
ValueError – When no builder has been constructed yet for the given h5py object
- Parameters:
h5obj (
DatasetorGroup) – the HDF5 object to the corresponding Builder object for
- get_container(h5obj)
Get the container for the corresponding h5py Group or Dataset
- Raises:
ValueError – When no builder has been constructed yet for the given h5py object
- Parameters:
h5obj (
DatasetorGroup) – the HDF5 object to the corresponding Container/Data object for
- open()
Open this HDMFIO object for writing of the builder
- close(close_links=True)
Close this file and any files linked to from this file.
- Parameters:
close_links (bool) – Whether to close all files linked to from this file. (default: True)
- close_linked_files()
Close all opened, linked-to files.
MacOS and Linux automatically release the linked-to file after the linking file is closed, but Windows does not, which prevents the linked-to file from being deleted or truncated. Use this method to close all opened, linked-to files.
- write_builder(builder, link_data=True, exhaust_dci=True, export_source=None, expandable=('VectorData', 'ElementIdentifiers'))
- Parameters:
builder (
GroupBuilder) – the GroupBuilder object representing the HDF5 filelink_data (
bool) – If not specified otherwise link (True) or copy (False) HDF5 Datasetsexhaust_dci (
bool) – exhaust DataChunkIterators one at a time. If False, exhaust them concurrentlyexport_source (
str) – The source of the builders when exportingexpandable (
listortuple) – A list of data type names whose datasets (and subclasses) will be created as expandable — maxshape is set based on the matching shape defined in the spec. Default is (“VectorData”, “ElementIdentifiers”), so only DynamicTable columns and id are expandable. Pass an empty list/tuple to disable automatic expansion entirely.
- classmethod get_type(data)
- set_attributes(obj, attributes)
- write_group(parent, builder, link_data=True, exhaust_dci=True, export_source=None, expandable=('VectorData', 'ElementIdentifiers'))
- Parameters:
parent (
Group) – the parent HDF5 objectbuilder (
GroupBuilder) – the GroupBuilder to writelink_data (
bool) – If not specified otherwise link (True) or copy (False) HDF5 Datasetsexhaust_dci (
bool) – exhaust DataChunkIterators one at a time. If False, exhaust them concurrentlyexport_source (
str) – The source of the builders when exportingexpandable (
listortuple) – A list of data type names whose datasets (and subclasses) will be created as expandable — maxshape is set based on the matching shape defined in the spec. Default is (“VectorData”, “ElementIdentifiers”), so only DynamicTable columns and id are expandable. Pass an empty list/tuple to disable automatic expansion entirely.
- Returns:
the Group that was created
- Return type:
- write_link(parent, builder, export_source=None)
- Parameters:
parent (
Group) – the parent HDF5 objectbuilder (
LinkBuilder) – the LinkBuilder to writeexport_source (
str) – The source of the builders when exporting
- Returns:
the Link that was created
- Return type:
- write_dataset(parent, builder, link_data=True, exhaust_dci=True, export_source=None, expandable=('VectorData', 'ElementIdentifiers'))
Write a dataset to HDF5
The function uses other dataset-dependent write functions, e.g,
__scalar_fill__,__list_fill__, and__setup_chunked_dset__to write the data.- Parameters:
parent (
Group) – the parent HDF5 objectbuilder (
DatasetBuilder) – the DatasetBuilder to writelink_data (
bool) – If not specified otherwise link (True) or copy (False) HDF5 Datasetsexhaust_dci (
bool) – exhaust DataChunkIterators one at a time. If False, exhaust them concurrentlyexport_source (
str) – The source of the builders when exportingexpandable (
listortuple) – A list of data type names whose datasets (and subclasses) will be created as expandable — maxshape is set based on the matching shape defined in the spec. Default is (“VectorData”, “ElementIdentifiers”), so only DynamicTable columns and id are expandable. Pass an empty list/tuple to disable automatic expansion entirely.
- Returns:
the Dataset that was created
- Return type:
- static compute_default_chunk_shape(data_shape, dtype, target_chunk_bytes=4194304, neurodata_type=None)
Compute a chunk shape targeting a given number of bytes per chunk.
h5py’s default auto-chunking targets 8-500 KB chunks for datasets under 100 GB, depending on dataset size. This is too small for cloud access where each chunk may require a separate HTTP range request. This method targets larger chunks (default 4 MB) in the recommended 2-16 MB range for cloud-hosted files.
The algorithm keeps all dimensions except the first at their full size and adjusts the first dimension to reach the target chunk size. When a single slice along the first dimension already exceeds the target (e.g. mesoscale imaging frames), trailing dimensions are halved in place of the largest axis until the chunk fits within the target.
- Parameters:
data_shape (tuple) – The shape of the dataset.
dtype (numpy.dtype or type) – The data type, used to determine bytes per element.
target_chunk_bytes (int) – Target chunk size in bytes. Default is 4 MB.
neurodata_type (str) – Name of the neurodata type for this dataset. Unused by the default implementation; provided as a hook so subclasses can specialize chunking per type.
- Returns:
The computed chunk shape, or
Trueto fall back to h5py auto-chunking when a shape cannot be computed (unsupported dtype or zero-length trailing dimension).- Return type:
- property mode
Return the HDF5 file mode. One of (“w”, “r”, “r+”, “a”, “w-”, “x”).
- classmethod set_dataio(data=None, maxshape=None, chunks=None, compression=None, compression_opts=None, fillvalue=None, shuffle=None, fletcher32=None, link_data=False, allow_plugin_filters=False, shape=None, dtype=None)
Wrap the given Data object with an H5DataIO.
This method is provided merely for convenience. It is the equivalent of the following:
from hdmf.backends.hdf5 import H5DataIO data = ... data = H5DataIO(data)
- Parameters:
data (
ndarrayorlistortupleorDatasetorIterable) – the data to be written. NOTE: If an h5py.Dataset is used, all other settings but link_data will be ignored as the dataset will either be linked to or copied as is in H5DataIO.maxshape (
tuple) – Dataset will be resizable up to this shape (Tuple). Automatically enables chunking.Use None for the axes you want to be unlimited.chunks (
boolortuple) – Chunk shape or True to enable auto-chunkingcompression (
strorboolorint) – Compression strategy. If a bool is given, then gzip compression will be used by default.http://docs.h5py.org/en/latest/high/dataset.html#dataset-compressioncompression_opts (
intortuple) – Parameter for compression filterfillvalue (None) – Value to be returned when reading uninitialized parts of the dataset
shuffle (
bool) – Enable shuffle I/O filter. http://docs.h5py.org/en/latest/high/dataset.html#dataset-shufflefletcher32 (
bool) – Enable fletcher32 checksum. http://docs.h5py.org/en/latest/high/dataset.html#dataset-fletcher32link_data (
bool) – If data is an h5py.Dataset should it be linked to or copied. NOTE: This parameter is only allowed if data is an h5py.Datasetallow_plugin_filters (
bool) – Enable passing dynamically loaded filters as compression parametershape (
tuple) – the shape of the new dataset, used only if data is Nonedtype (
strortypeordtype) – the data type of the new dataset, used only if data is None
- static generate_dataset_html(dataset)
Generates an html representation for a dataset for the HDF5IO class