hdmf.data_utils module

class hdmf.data_utils.AbstractDataChunkIterator

Bases: object

Abstract iterator class used to iterate over DataChunks.

Derived classes must ensure that all abstract methods and abstract properties are implemented, in particular, dtype, maxshape, __iter__, ___next__, recommended_chunk_shape, and recommended_data_shape.

__iter__()

Return the iterator object

__next__()

Return the next data chunk or raise a StopIteration exception if all chunks have been retrieved.

HINT: numpy.s_ provides a convenient way to generate index tuples using standard array slicing. This is often useful to define the DataChunk.selection of the current chunk

Returns:DataChunk object with the data and selection of the current chunk
Return type:DataChunk
recommended_chunk_shape()

Recommend the chunk shape for the data array.

Returns:NumPy-style shape tuple describing the recommended shape for the chunks of the target array or None. This may or may not be the same as the shape of the chunks returned in the iteration process.
recommended_data_shape()

Recommend the initial shape for the data array.

This is useful in particular to avoid repeated resized of the target array when reading from this data iterator. This should typically be either the final size of the array or the known minimal shape of the array.

Returns:NumPy-style shape tuple indicating the recommended initial shape for the target array. This may or may not be the final full shape of the array, i.e., the array is allowed to grow. This should not be None.
dtype

Define the data type of the array

Returns:NumPy style dtype or otherwise compliant dtype string
maxshape

Property describing the maximum shape of the data array that is being iterated over

Returns:NumPy-style shape tuple indicating the maxiumum dimensions up to which the dataset may be resized. Axes with None are unlimited.
class hdmf.data_utils.DataChunkIterator(data=None, maxshape=None, dtype=None, buffer_size=1, iter_axis=0)

Bases: hdmf.data_utils.AbstractDataChunkIterator

Custom iterator class used to iterate over chunks of data.

This default implementation of AbstractDataChunkIterator accepts any iterable and assumes that we iterate over a single dimension of the data array (default: the first dimension). DataChunkIterator supports buffered read, i.e., multiple values from the input iterator can be combined to a single chunk. This is useful for buffered I/O operations, e.g., to improve performance by accumulating data in memory and writing larger blocks at once.

Initialize the DataChunkIterator.
If ‘data’ is an iterator and ‘dtype’ is not specified, then next is called on the iterator in order to determine the dtype of the data.
Parameters:
  • data (None) – The data object used for iteration
  • maxshape (tuple) – The maximum shape of the full data array. Use None to indicate unlimited dimensions
  • dtype (dtype) – The Numpy data type for the array
  • buffer_size (int) – Number of values to be buffered in a chunk
  • iter_axis (int) – The dimension to iterate over
classmethod from_iterable(data=None, maxshape=None, dtype=None, buffer_size=1, iter_axis=0)
Parameters:
  • data (None) – The data object used for iteration
  • maxshape (tuple) – The maximum shape of the full data array. Use None to indicate unlimited dimensions
  • dtype (dtype) – The Numpy data type for the array
  • buffer_size (int) – Number of values to be buffered in a chunk
  • iter_axis (int) – The dimension to iterate over
next()

Return the next data chunk or raise a StopIteration exception if all chunks have been retrieved.

HINT: numpy.s_ provides a convenient way to generate index tuples using standard array slicing. This is often useful to define the DataChunk.selection of the current chunk

Returns:DataChunk object with the data and selection of the current chunk
Return type:DataChunk
recommended_chunk_shape()

Recommend a chunk shape.

To optimize iterative write the chunk should be aligned with the common shape of chunks returned by __next__ or if those chunks are too large, then a well-aligned subset of those chunks. This may also be any other value in case one wants to recommend chunk shapes to optimize read rather than write. The default implementation returns None, indicating no preferential chunking option.
recommended_data_shape()
Recommend an initial shape of the data. This is useful when progressively writing data and
we want to recommend an initial size for the dataset
maxshape

Get a shape tuple describing the maximum shape of the array described by this DataChunkIterator. If an iterator is provided and no data has been read yet, then the first chunk will be read (i.e., next will be called on the iterator) in order to determine the maxshape.

Returns:Shape tuple. None is used for dimenwions where the maximum shape is not known or unlimited.
dtype

Get the value data type

Returns:np.dtype object describing the datatype
class hdmf.data_utils.DataChunk(data=None, selection=None)

Bases: object

Class used to describe a data chunk. Used in DataChunkIterator.

Parameters:
  • data (ndarray) – Numpy array with the data value(s) of the chunk
  • selection (None) – Numpy index tuple describing the location of the chunk
astype(dtype)

Get a new DataChunk with the self.data converted to the given type

dtype

Data type of the values in the chunk

Returns:np.dtype of the values in the DataChunk
hdmf.data_utils.assertEqualShape(data1, data2, axes1=None, axes2=None, name1=None, name2=None, ignore_undetermined=True)

Ensure that the shape of data1 and data2 match along the given dimensions

Parameters:
  • data1 (List, Tuple, np.ndarray, DataChunkIterator etc.) – The first input array
  • data2 (List, Tuple, np.ndarray, DataChunkIterator etc.) – The second input array
  • name1 – Optional string with the name of data1
  • name2 – Optional string with the name of data2
  • axes1 (int, Tuple of ints, List of ints, or None) – The dimensions of data1 that should be matched to the dimensions of data2. Set to None to compare all axes in order.
  • axes2 – The dimensions of data2 that should be matched to the dimensions of data1. Must have the same length as axes1. Set to None to compare all axes in order.
  • ignore_undetermined – Boolean indicating whether non-matching unlimited dimensions should be ignored, i.e., if two dimension don’t match because we can’t determine the shape of either one, then should we ignore that case or treat it as no match
Returns:

Bool indicating whether the check passed and a string with a message about the matching process

class hdmf.data_utils.ShapeValidatorResult(result=False, message=None, ignored=(), unmatched=(), error=None, shape1=(), shape2=(), axes1=(), axes2=())

Bases: object

Class for storing results from validating the shape of multi-dimensional arrays.

This class is used to store results generated by ShapeValidator

Variables:
  • result – Boolean indicating whether results matched or not
  • message – Message indicating the result of the matching procedure
Parameters:
  • result (bool) – Result of the shape validation
  • message (str) – Message describing the result of the shape validation
  • ignored (tuple) – Axes that have been ignored in the validaton process
  • unmatched (tuple) – List of axes that did not match during shape validation
  • error (str) – Error that may have occurred. One of ERROR_TYPE
  • shape1 (tuple) – Shape of the first array for comparison
  • shape2 (tuple) – Shape of the second array for comparison
  • axes1 (tuple) – Axes for the first array that should match
  • axes2 (tuple) – Axes for the second array that should match
SHAPE_ERROR = {None: 'All required axes matched', 'NUM_DIMS_ERROR': 'Unequal number of dimensions.', 'NUM_AXES_ERROR': 'Unequal number of axes for comparison.', 'AXIS_OUT_OF_BOUNDS': 'Axis index for comparison out of bounds.', 'AXIS_LEN_ERROR': 'Unequal length of axes.'}

Dict where the Keys are the type of errors that may have occurred during shape comparison and the values are strings with default error messages for the type.

class hdmf.data_utils.DataIO(data=None)

Bases: object

Base class for wrapping data arrays for I/O. Derived classes of DataIO are typically used to pass dataset-specific I/O parameters to the particular HDMFIO backend.

Parameters:data (ndarray or list or tuple or Dataset or HDMFDataset or AbstractDataChunkIterator) – the data to be written
get_io_params()

Returns a dict with the I/O parameters specified in this DataIO.

data

Get the wrapped data object

__getitem__(item)

Delegate slicing to the data object

valid

bool indicating if the data object is valid

exception hdmf.data_utils.InvalidDataIOError

Bases: Exception