The Hierarchical Data Modeling Framework¶
HDMF is a Python package for working with standardizing, reading, and writing hierarchical object data.
HDMF is a by-product of the Neurodata Without Borders (NWB) project. The goal of NWB was to enable collaborative science within the neurophysiology and systems neuroscience communities through data standardization. The team of neuroscientists and software developers involved with NWB recognize that adoption of a unified data format is an important step toward breaking down the barriers to data sharing in neuroscience. HDMF was central to the NWB development efforts, and has since been split off with the intention of providing it as an open-source tool for other scientific communities.
If you use HDMF in your research, please use the following citation:
A. J. Tritt et al., “HDMF: Hierarchical Data Modeling Framework for Modern Science Data Standards,” 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA, 2019, pp. 165-179, doi: 10.1109/BigData47090.2019.9005648.
Installing HDMF¶
HDMF requires having Python 3.8, 3.9, 3.10, 3.11, or 3.12 installed. If you don’t have Python installed and want the simplest way to get started, we recommend you install and use the Anaconda Distribution. It includes Python, NumPy, and many other commonly used packages for scientific computing and data science.
HDMF can be installed with pip
, conda
, or from source. HDMF works on Windows, macOS, and Linux.
Option 1: Using pip¶
If you are a beginner to programming in Python and using Python tools, we recommend that you install HDMF by running the following command in a terminal or command prompt:
pip install hdmf
Option 2: Using conda¶
You can also install HDMF using conda
by running the following command in a terminal or command prompt:
conda install -c conda-forge hdmf
Tutorials¶
Note
Go to the end to download the full example code.
GenericDataChunkIterator Tutorial¶
This is a tutorial for interacting with GenericDataChunkIterator
objects. This tutorial
is written for beginners and does not describe the full capabilities and nuances
of the functionality. This tutorial is designed to give
you basic familiarity with how GenericDataChunkIterator
works and help you get started
with creating a specific instance for your data format or API access pattern.
Introduction¶
The GenericDataChunkIterator
class represents a semi-abstract
version of a AbstractDataChunkIterator
that automatically handles the selection
of buffer regions
and resolves communication of compatible chunk regions within a H5DataIO wrapper. It does not,
however, know how data (values) or metadata (data type, full shape) ought to be directly
accessed. This is by intention to be fully agnostic to a range of indexing methods and
format-independent APIs, rather than make strong assumptions about how data ranges are to be sliced.
Constructing a simple child class¶
We will begin with a simple example case of data access to a standard Numpy array.
To create a GenericDataChunkIterator
that accomplishes this,
we begin by defining our child class.
import numpy as np
from hdmf.data_utils import GenericDataChunkIterator
class NumpyArrayDataChunkIterator(GenericDataChunkIterator):
def __init__(self, array: np.ndarray, **kwargs):
self.array = array
super().__init__(**kwargs)
def _get_data(self, selection):
return self.array[selection]
def _get_maxshape(self):
return self.array.shape
def _get_dtype(self):
return self.array.dtype
# To instantiate this class on an array to allow iteration over buffer_shapes,
my_array = np.random.randint(low=0, high=10, size=(12, 6), dtype="int16")
my_custom_iterator = NumpyArrayDataChunkIterator(array=my_array)
# and this iterator now behaves as a standard Python generator (i.e., it can only be exhausted once)
# that returns DataChunk objects for each buffer.
for buffer in my_custom_iterator:
print(buffer.data)
[[1 5 0 9 3 5]
[9 5 0 4 1 0]
[5 5 6 2 6 7]
[4 9 7 1 5 9]
[9 9 8 3 4 9]
[4 7 6 5 4 5]
[9 9 3 8 8 4]
[1 9 3 5 7 7]
[9 3 1 1 6 0]
[3 0 3 3 2 8]
[7 3 7 4 7 4]
[7 8 8 7 9 7]]
Intended use for advanced data I/O¶
Of course, the real use case for this class is intended for when the amount of data stored on a hard drive is larger than what can be read into RAM. Hence the goal is to read only an amount of data with a size in gigabytes (GB) at or below the buffer_gb argument (defaults to 1 GB).
# This design can be seen if we increase the amount of data in our example code
my_array = np.random.randint(low=0, high=10, size=(20000, 5000), dtype="int32")
my_custom_iterator = NumpyArrayDataChunkIterator(array=my_array, buffer_gb=0.2)
for j, buffer in enumerate(my_custom_iterator, start=1):
print(f"Buffer number {j} returns data from selection: {buffer.selection}")
Buffer number 1 returns data from selection: (slice(0, 12640, None), slice(0, 3160, None))
Buffer number 2 returns data from selection: (slice(0, 12640, None), slice(3160, 5000, None))
Buffer number 3 returns data from selection: (slice(12640, 20000, None), slice(0, 3160, None))
Buffer number 4 returns data from selection: (slice(12640, 20000, None), slice(3160, 5000, None))
Note
Technically, in this example the total data is still fully loaded into RAM from the initial Numpy array. A more accurate use case would be achieved from writing the test_array to a temporary file on your system and loading it back with np.memmap, which is a subtype of Numpy arrays that do not immediately load the data.
Writing to an HDF5 file with full control of shape arguments¶
The true intention of returning data selections of this form, and within a DataChunk object, is to write these piecewise to an HDF5 dataset.
# This is where the importance of the underlying `chunk_shape` comes in, and why it is critical to performance
# that it perfectly subsets the `buffer_shape`.
import h5py
maxshape = (20000, 5000)
buffer_shape = (10000, 2500)
chunk_shape = (1000, 250)
my_array = np.random.randint(low=0, high=10, size=maxshape, dtype="int32")
my_custom_iterator = NumpyArrayDataChunkIterator(array=my_array, buffer_shape=buffer_shape, chunk_shape=chunk_shape)
out_file = "my_temporary_test_file.hdf5"
with h5py.File(name=out_file, mode="w") as f:
dset = f.create_dataset(name="test", shape=maxshape, dtype="int16", chunks=my_custom_iterator.chunk_shape)
for buffer in my_custom_iterator:
dset[buffer.selection] = buffer.data
# Remember to remove the temporary file after running this and exploring the contents!
Note
Here we explicitly set the chunks value in the HDF5 dataset object; however, a nice part of the design of this
iterator is that when wrapped in a hdmf.backends.hdf5.h5_utils.H5DataIO
that is called within a
hdmf.backends.hdf5.h5tools.HDF5IO
context with a corresponding hdmf.container.Container
, these details
will be automatically parsed.
Note
There is some overlap here in nomenclature between HDMF and HDF5. The term chunk in both
HDMF and HDF5 refer to a subset of dataset, however, in HDF5 a chunk is a piece of dataset on disk,
whereas in the context of the DataChunk
iteration is a block of data in memory.
As such, the
requirements on the shape and size of chunks are different. In HDF5 these chunks are pieces
of a dataset that get compressed and cached together, and they should usually be small in size for
optimal performance (typically 1 MB or less). In contrast, a DataChunk
in
HDMF acts as a block of data for writing data to dataset, and spans multiple HDF5 chunks to improve performance.
This is achieved by avoiding repeat
updates to the same Chunk
in the HDF5 file, DataChunk
objects for write
should align with Chunks
in the HDF5 file, i.e., the DataChunk.selection
should fully cover one or more Chunks
in the HDF5 file to avoid repeat updates to the same
Chunks
in the HDF5 file. This is what the buffer of the :py:class`~hdmf.data_utils.GenericDataChunkIterator`
does, which upon each iteration returns a single
DataChunk
object (by default > 1 GB) that perfectly spans many HDF5 chunks
(by default < 1 MB) to help reduce the number of small I/O operations
and help improve performance. In practice, the buffer should usually be even larger than the default, i.e,
as much free RAM as can be safely used.
Remove the test file
import os
if os.path.exists(out_file):
os.remove(out_file)
Note
Go to the end to download the full example code.
DynamicTable Tutorial¶
This is a tutorial for interacting with DynamicTable
objects. This tutorial
is written for beginners and does not describe the full capabilities and nuances
of DynamicTable
functionality. Please see the DynamicTable How-To Guide
for more complete documentation. This tutorial is designed to give
you basic familiarity with how DynamicTable
works and help you get started
with creating a DynamicTable
, adding columns and rows to a
DynamicTable
,
and accessing data in a DynamicTable
.
Introduction¶
The DynamicTable
class represents a column-based table
to which you can add custom columns. It consists of a name, a description, a list of
row IDs, and a list of columns.
Constructing a table¶
To create a DynamicTable
, call the constructor for
DynamicTable
with a string name
and string
description
.
from hdmf.common import DynamicTable
users_table = DynamicTable(
name='users',
description='a table containing data/metadata about users, one user per row',
)
Adding columns¶
You can add columns to a DynamicTable
using
DynamicTable.add_column
.
users_table.add_column(
name='first_name',
description='the first name of the user',
)
users_table.add_column(
name='last_name',
description='the last name of the user',
)
Adding ragged array columns¶
You may want to add columns to your table that have a different number of entries per row.
This is called a “ragged array column”. To do this, pass index=True
to
DynamicTable.add_column
.
users_table.add_column(
name='phone_number',
description='the phone number of the user',
index=True,
)
Adding rows¶
You can add rows to a DynamicTable
using
DynamicTable.add_row
.
You must pass in a keyword argument for every column in the table.
Ragged array column arguments should be passed in as lists or numpy arrays.
The ID of the row will automatically be set and incremented for every row,
starting at 0.
# id will be set to 0 automatically
users_table.add_row(
first_name='Grace',
last_name='Hopper',
phone_number=['123-456-7890'],
)
# id will be set to 1 automatically
users_table.add_row(
first_name='Alan',
last_name='Turing',
phone_number=['555-666-7777', '888-111-2222'],
)
Displaying the table contents as a pandas DataFrame
¶
pandas is a popular data analysis tool for working with tabular data.
Convert your DynamicTable
to a pandas
DataFrame
using
DynamicTable.to_dataframe
.
users_df = users_table.to_dataframe()
users_df
Accessing the table as a DataFrame
provides you with powerful
methods for indexing, selecting, and querying tabular data from pandas.
Get the “last_name” column as a pandas Series
:
users_df['last_name']
id
0 Hopper
1 Turing
Name: last_name, dtype: object
The index of the DataFrame
is automatically set to the
table IDs. Get the row with ID = 0 as a pandas Series
:
users_df.loc[0]
first_name Grace
last_name Hopper
phone_number [123-456-7890]
Name: 0, dtype: object
Get single cells of the table by indexing with both ID and column name:
print('My first user:', users_df.loc[0, 'first_name'], users_df.loc[0, 'last_name'])
My first user: Grace Hopper
Adding columns that reference rows of other DynamicTable
objects¶
You can create a column that references rows of another
DynamicTable
. This is analogous to
a foreign key in a relational database. To do this, use the table
keyword
argument for
DynamicTable.add_column
and set it to the other table.
# create a new table of users
users_table = DynamicTable(
name='users',
description='a table containing data/metadata about users, one user per row',
)
# add simple columns to this table
users_table.add_column(
name='first_name',
description='the first name of the user',
)
users_table.add_column(
name='last_name',
description='the last name of the user',
)
# create a new table of addresses to reference
addresses_table = DynamicTable(
name='addresses',
description='a table containing data/metadata about addresses, one address per row',
)
addresses_table.add_column(
name='street_address',
description='the street number and address',
)
addresses_table.add_column(
name='city',
description='the city of the address',
)
# add rows to the addresses table
addresses_table.add_row(
street_address='123 Main St',
city='Springfield'
)
addresses_table.add_row(
street_address='45 British Way',
city='London'
)
# add a column to the users table that references rows of the addresses table
users_table.add_column(
name='address',
description='the address of the user',
table=addresses_table
)
# add rows to the users table
users_table.add_row(
first_name='Grace',
last_name='Hopper',
address=0 # <-- row index of the address table
)
users_table.add_row(
first_name='Alan',
last_name='Turing',
address=1 # <-- row index of the address table
)
Displaying the contents of a table with references to another table¶
Earlier, we converted a DynamicTable
to a
DataFrame
using
DynamicTable.to_dataframe
and printed the DataFrame
to see its contents.
This also works when the DynamicTable
contains a column
that references another table. However, the entries for this column for each row
will be printed as a nested DataFrame
. This can be difficult to
read, so to view only the row indices of the referenced table, pass
index=True
to
DynamicTable.to_dataframe
.
users_df = users_table.to_dataframe(index=True)
users_df
You can then access the referenced table using the table
attribute of the
column object. This is useful when reading a table from a file where you may not have
a variable to access the referenced table.
First, use DynamicTable.__getitem__
(square brackets notation) to get the
DynamicTableRegion
object representing the column.
Then access its table
attribute to get the addresses table and convert the table
to a DataFrame
.
address_column = users_table['address']
read_addresses_table = address_column.table
addresses_df = read_addresses_table.to_dataframe()
Get the addresses corresponding to the rows of the users table:
address_indices = users_df['address'] # pandas Series of row indices into the addresses table
addresses_df.iloc[address_indices] # use .iloc because these are row indices not ID values
Note
The indices returned by users_df['address']
are row indices and not
the ID values of the table. However, if you are using default IDs, these
values will be the same.
You now know the basics of creating DynamicTable
objects and reading data from them, including tables that have ragged array columns
and references to other tables. Learn more about working with
DynamicTable
in the DynamicTable How-To Guide,
including:
ragged array columns with references to other tables
nested ragged array columns
columns with multidimensional array data
columns with enumerated (categorical) data
accessing data and properties from the column objects directly
writing and reading tables to a file
writing expandable tables
defining subclasses of
DynamicTable
Note
Go to the end to download the full example code.
MultiContainerInterface¶
This is a guide to creating custom API classes with the MultiContainerInterface
class.
Introduction¶
The MultiContainerInterface
class provides an easy
and convenient way to create standard methods for a container class that contains
a collection of containers of a specified type. For
example, let’s say you want to define a class MyContainerHolder
that
contains a collection of MyContainer
objects. By having MyContainerHolder
extend MultiContainerInterface
and specifying certain configuration settings
in the class, your MyContainerHolder
class would be generated with:
an attribute for a labelled dictionary that holds
MyContainer
objectsan
__init__
method to initializeMyContainerHolder
with a collection ofMyContainer
objectsa method to add
MyContainer
objects to the dictionaryaccess of items from the dictionary using
__getitem__
(square bracket notation)a method to get
MyContainer
objects from the dictionary (optional)a method to create
MyContainer
objects and add them to the dictionary (optional)
Specifying the class configuration¶
To specify the class configuration for a
MultiContainerInterface
subclass, define the variable
__clsconf__
in the new class. __clsconf__
should be set to a dictionary
with three required keys, 'attr'
, 'type'
, and 'add'
.
The 'attr'
key should map to a string value that is the name of the attribute that
will be created to hold the collection of container objects.
The 'type'
key should map to a type or a tuple of types that says what objects
are allowed in this collection.
The 'add'
key should map to a string value that is the name of the
method to be generated that allows users to add a container to the collection.
from hdmf.container import Container, MultiContainerInterface
class ContainerHolder(MultiContainerInterface):
__clsconf__ = {
'attr': 'containers',
'type': Container,
'add': 'add_container',
}
The above code will generate:
the attribute
containers
as aLabelledDict
that holdsContainer
objectsthe
__init__
method which accepts a collection ofContainer
objectsthe
add_container
method that allows users to addContainer
objects to thecontainers
dictionary.
Here is an example of instantiating the new ContainerHolder
class and
using the generated add method.
obj1 = Container('obj1')
obj2 = Container('obj2')
holder1 = ContainerHolder()
holder1.add_container(obj1)
holder1.add_container(obj2)
holder1.containers # this is a LabelledDict where the keys are the name of the container
# i.e., {'obj1': obj1, 'obj2': obj2}
Constructor options¶
The constructor accepts a dict/list/tuple of Container
objects, a single
Container
object, or None. If a dict is passed, only the dict values are used.
You can specify the argument as a keyword argument with the attribute name as
the keyword argument key.
holder2 = ContainerHolder(obj1)
holder3 = ContainerHolder([obj1, obj2])
holder4 = ContainerHolder({'unused_key1': obj1, 'unused_key2': obj2})
holder5 = ContainerHolder(containers=obj1)
By default, the new class has the ‘name’ attribute set to the name of the class, but a user-specified name can be provided in the constructor.
named_holder = ContainerHolder(name='My Holder')
Adding containers to the collection¶
Similar to the constructor, the generated add method accepts a dict/list/tuple
of Container
objects or a single Container
object. If a dict is passed,
only the dict values are used.
holder6 = ContainerHolder()
holder6.add_container(obj1)
holder7 = ContainerHolder()
holder7.add_container([obj1, obj2])
holder8 = ContainerHolder()
holder8.add_container({'unused_key1': obj1, 'unused_key2': obj2})
holder9 = ContainerHolder()
holder9.add_container(containers=obj1)
Getting items from the collection¶
You can access a container in the collection by using the name of the container within square brackets. As a convenience, if there is only one item in the collection, you can use None within square brackets.
holder10 = ContainerHolder(obj1)
holder10['obj1']
holder10[None]
Getting items from the collection using a custom getter¶
You can use the 'get'
key in __clsconf__
to generate a getter method as
an alternative to using the square bracket notation for accessing items from
the collection. Like the square bracket notation, if there is only one item
in the collection, you can omit the name or pass None to the getter method.
The 'get'
key should map to a string value that is the name of the getter
method to be generated. The 'get'
key in __clsconf__
is optional.
class ContainerHolderWithGet(MultiContainerInterface):
__clsconf__ = {
'attr': 'containers',
'type': Container,
'add': 'add_container',
'get': 'get_container',
}
holder11 = ContainerHolderWithGet(obj1)
holder11.get_container('obj1')
holder11.get_container()
Creating and adding items to the collection using a custom create method¶
You can use the 'create'
key in __clsconf__
to generate a create method
as a convenience method so that users do not need to initialize the
Container
object and then add it to the collection. Those two steps are
combined into one line. The arguments to the custom create method are the
same as the arguments to the Container’s __init__
method, but the
__init__
method must be defined using docval
.
The created object will be returned by the create method.
The 'create'
key should map to a string value that is the name of the create
method to be generated. The 'create'
key in __clsconf__
is optional.
class ContainerHolderWithCreate(MultiContainerInterface):
__clsconf__ = {
'attr': 'containers',
'type': Container,
'add': 'add_container',
'create': 'create_container',
}
holder12 = ContainerHolderWithCreate()
holder12.create_container('obj1')
Specifying multiple types allowed in the collection¶
The 'type'
key in __clsconf__
allows specifying a single type or a
list/tuple of types.
You cannot specify the 'create'
key in __clsconf__
when multiple types are allowed in the collection because it cannot be
determined which type to initialize.
from hdmf.container import Data
class ContainerHolderWithMultipleTypes(MultiContainerInterface):
__clsconf__ = {
'attr': 'things',
'type': (Container, Data),
'add': 'add_thing',
}
Specifying multiple collections¶
You can support multiple collections in your
MultiContainerInterface
subclass by setting the __clsconf__
variable to a list of dicts instead
of a single dict.
When specifying multiple collections, square bracket notation access of items
(i.e., calling __getitem__
) is not supported, because it is not clear
which collection to get the desired item from.
from hdmf.container import Data
class MultiCollectionHolder(MultiContainerInterface):
__clsconf__ = [
{
'attr': 'containers',
'type': Container,
'add': 'add_container',
},
{
'attr': 'data',
'type': Data,
'add': 'add_data',
},
]
Managing container parents¶
If the parent of the container being added is not already set, then the parent will be set to the containing object.
obj3 = Container('obj3')
holder13 = ContainerHolder(obj3)
obj3.parent # this is holder13
LabelledDict
objects support removal of an item using
the del operator or the pop
method. If the parent of the container being removed is the containing object,
then its parent will be reset to None.
del holder13.containers['obj3']
obj3.parent # this is back to None
Using a custom constructor¶
You can override the automatically generated constructor for your
MultiContainerInterface
subclass.
class ContainerHolderWithCustomInit(MultiContainerInterface):
__clsconf__ = {
'attr': 'containers',
'type': Container,
'add': 'add_container',
}
def __init__(self, name, my_containers):
super().__init__(name=name)
self.containers = my_containers
self.add_container(Container('extra_container'))
holder14 = ContainerHolderWithCustomInit('my_name', [obj1, obj2])
holder14.containers # contains the 'extra_container' container
Note
Go to the end to download the full example code.
TermSet¶
This is a user guide for interacting with the
TermSet
and TermSetWrapper
classes.
The TermSet
and TermSetWrapper
types
are experimental and are subject to change in future releases. If you use these types,
please provide feedback to the HDMF team so that we can improve the structure and
overall capabilities.
Introduction¶
The TermSet
class provides a way for users to create their own
set of terms from brain atlases, species taxonomies, and anatomical, cell, and
gene function ontologies.
Users will be able to validate their data and attributes to their own set of terms, ensuring
clean data to be used inline with the FAIR principles later on.
The TermSet
class allows for a reusable and sharable
pool of metadata to serve as references for any dataset or attribute.
The TermSet
class is used closely with
HERD
to more efficiently map terms
to data.
In order to actually use a TermSet
, users will use the
TermSetWrapper
to wrap data and attributes. The
TermSetWrapper
uses a user-provided TermSet
to perform validation.
TermSet
is built upon the resources from LinkML, a modeling
language that uses YAML-based schema, giving TermSet
a standardized structure and a variety of tools to help the user manage their references.
How to make a TermSet Schema¶
Before the user can take advantage of all the wonders within the
TermSet
class, the user needs to create a LinkML schema (YAML) that provides
all the permissible term values. Please refer to https://linkml.io/linkml/intro/tutorial06.html
to learn more about how LinkML structures their schema.
The name of the schema is up to the user, e.g., the name could be “Species” if the term set will contain species terms.
The prefixes will be the standardized prefix of your source, followed by the URI to the terms. For example, the NCBI Taxonomy is abbreviated as NCBI_TAXON, and Ensemble is simply Ensemble. As mentioned prior, the URI needs to be to the terms; this is to allow the URI to later be coupled with the source id for the term to create a valid link to the term source page.
The schema uses LinkML enumerations to list all the possible terms. To define the all the permissible values, the user can define them manually in the schema, transfer them from a Google spreadsheet, or pull them into the schema dynamically from a LinkML supported source.
For a clear example, please view the example_term_set.yaml for this tutorial, which provides a concise example of how a term set schema looks.
Note
For more information regarding LinkML Enumerations, please refer to https://linkml.io/linkml/intro/tutorial06.html.
Note
For more information on how to properly format the Google spreadsheet to be compatible with LinkMl, please refer to https://linkml.io/schemasheets/#examples.
Note
For more information how to properly format the schema to support LinkML Dynamic Enumerations, please refer to https://linkml.io/linkml/schemas/enums.html#dynamic-enums.
from hdmf.common import DynamicTable, VectorData
import os
import numpy as np
try:
import linkml_runtime # noqa: F401
except ImportError as e:
raise ImportError("Please install linkml-runtime to run this example: pip install linkml-runtime") from e
from hdmf.term_set import TermSet, TermSetWrapper
try:
dir_path = os.path.dirname(os.path.abspath(__file__))
yaml_file = os.path.join(dir_path, 'example_term_set.yaml')
schemasheets_folder = os.path.join(dir_path, 'schemasheets')
dynamic_schema_path = os.path.join(dir_path, 'example_dynamic_term_set.yaml')
except NameError:
dir_path = os.path.dirname(os.path.abspath('.'))
yaml_file = os.path.join(dir_path, 'gallery/example_term_set.yaml')
schemasheets_folder = os.path.join(dir_path, 'gallery/schemasheets')
dynamic_schema_path = os.path.join(dir_path, 'gallery/example_dynamic_term_set.yaml')
# Use Schemasheets to create TermSet schema
# -----------------------------------------
# The :py:class:`~hdmf.term_set.TermSet` class builds off of LinkML Schemasheets, allowing users to convert between
# a Google spreadsheet to a complete LinkML schema. Once the user has defined the necessary LinkML metadata within the
# spreadsheet, the spreadsheet needs to be saved as individual tsv files, i.e., one tsv file per spreadsheet tab. Please
# refer to the Schemasheets tutorial link above for more details on the required syntax structure within the sheets.
# Once the tsv files are in a folder, the user simply provides the path to the folder with ``schemasheets_folder``.
termset = TermSet(schemasheets_folder=schemasheets_folder)
# Use Dynamic Enumerations to populate TermSet
# --------------------------------------------
# The :py:class:`~hdmf.term_set.TermSet` class allows user to skip manually defining permissible values, by pulling from
# a LinkML supported source. These sources contain multiple ontologies. A user can select a node from an ontology,
# in which all the elements on the branch, starting from the chosen node, will be used as permissible values.
# Please refer to the LinkMl Dynamic Enumeration tutorial for more information on these sources and how to setup Dynamic
# Enumerations within the schema. Once the schema is ready, the user provides a path to the schema and set
# ``dynamic=True``. A new schema, with the populated permissible values, will be created in the same directory.
termset = TermSet(term_schema_path=dynamic_schema_path, dynamic=True)
ERROR:root:Prefix TEMP not declared: using default
Downloading cl.db.gz: 0.00B [00:00, ?B/s]
Downloading cl.db.gz: 0%| | 8.00k/82.9M [00:00<33:09, 43.7kB/s]
Downloading cl.db.gz: 0%| | 272k/82.9M [00:00<01:19, 1.09MB/s]
Downloading cl.db.gz: 4%|▍ | 3.53M/82.9M [00:00<00:06, 12.0MB/s]
Downloading cl.db.gz: 10%|█ | 8.70M/82.9M [00:00<00:03, 23.8MB/s]
Downloading cl.db.gz: 18%|█▊ | 14.7M/82.9M [00:00<00:02, 35.4MB/s]
Downloading cl.db.gz: 24%|██▎ | 19.5M/82.9M [00:00<00:01, 38.2MB/s]
Downloading cl.db.gz: 31%|███ | 25.8M/82.9M [00:00<00:01, 46.2MB/s]
Downloading cl.db.gz: 37%|███▋ | 31.0M/82.9M [00:00<00:01, 46.0MB/s]
Downloading cl.db.gz: 45%|████▌ | 37.4M/82.9M [00:01<00:00, 52.1MB/s]
Downloading cl.db.gz: 52%|█████▏ | 42.7M/82.9M [00:01<00:00, 50.9MB/s]
Downloading cl.db.gz: 58%|█████▊ | 48.4M/82.9M [00:01<00:00, 53.2MB/s]
Downloading cl.db.gz: 66%|██████▌ | 54.8M/82.9M [00:01<00:00, 54.6MB/s]
Downloading cl.db.gz: 74%|███████▎ | 61.1M/82.9M [00:01<00:00, 57.8MB/s]
Downloading cl.db.gz: 81%|████████ | 67.0M/82.9M [00:01<00:00, 57.0MB/s]
Downloading cl.db.gz: 88%|████████▊ | 72.5M/82.9M [00:01<00:00, 57.1MB/s]
Downloading cl.db.gz: 96%|█████████▌| 79.2M/82.9M [00:01<00:00, 60.9MB/s]
Viewing TermSet values¶
TermSet
has methods to retrieve terms. The view_set
method will return a dictionary of all the terms and the corresponding information for each term.
Users can index specific terms from the TermSet
. LinkML runtime will need to be installed.
You can do so by first running pip install linkml-runtime
.
terms = TermSet(term_schema_path=yaml_file)
print(terms.view_set)
# Retrieve a specific term
terms['Homo sapiens']
{'Homo sapiens': Term_Info(id='NCBI_TAXON:9606', description='the species is human', meaning='https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=9606'), 'Mus musculus': Term_Info(id='NCBI_TAXON:10090', description='the species is a house mouse', meaning='https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=10090'), 'Ursus arctos horribilis': Term_Info(id='NCBI_TAXON:116960', description='the species is a grizzly bear', meaning='https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=116960'), 'Myrmecophaga tridactyla': Term_Info(id='NCBI_TAXON:71006', description='the species is an anteater', meaning='https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=71006')}
Term_Info(id='NCBI_TAXON:9606', description='the species is human', meaning='https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=9606')
Validate Data with TermSetWrapper¶
TermSetWrapper
can be wrapped around data.
To validate data, the user will set the data to the wrapped data, in which validation must pass
for the data object to be created.
data = VectorData(
name='species',
description='...',
data=TermSetWrapper(value=['Homo sapiens'], termset=terms)
)
Validate Compound Data with TermSetWrapper¶
TermSetWrapper
can be wrapped around compound data.
The user will set the field within the compound data type that is to be validated
with the termset.
c_data = np.array([('Homo sapiens', 24)], dtype=[('species', 'U50'), ('age', 'i4')])
data = VectorData(
name='species',
description='...',
data=TermSetWrapper(value=c_data, termset=terms, field='species')
)
Validate Attributes with TermSetWrapper¶
Similar to wrapping datasets, TermSetWrapper
can be wrapped around any attribute.
To validate attributes, the user will set the attribute to the wrapped value, in which validation must pass
for the object to be created.
data = VectorData(
name='species',
description=TermSetWrapper(value='Homo sapiens', termset=terms),
data=['Human']
)
Validate on append with TermSetWrapper¶
As mentioned prior, when using a TermSetWrapper
, all new data is validated.
This is true for adding new data with append and extend.
data = VectorData(
name='species',
description='...',
data=TermSetWrapper(value=['Homo sapiens'], termset=terms)
)
data.append('Ursus arctos horribilis')
data.extend(['Mus musculus', 'Myrmecophaga tridactyla'])
Validate Data in a DynamicTable¶
Validating data for DynamicTable
is determined by which columns were
initialized with a TermSetWrapper
. The data is validated when the columns
are created and modified using DynamicTable.add_row
.
col1 = VectorData(
name='Species_1',
description='...',
data=TermSetWrapper(value=['Homo sapiens'], termset=terms),
)
col2 = VectorData(
name='Species_2',
description='...',
data=TermSetWrapper(value=['Mus musculus'], termset=terms),
)
species = DynamicTable(name='species', description='My species', columns=[col1,col2])
Validate new rows in a DynamicTable with TermSetWrapper¶
Validating new rows to DynamicTable
is simple. The
add_row
method will automatically check each column for a
TermSetWrapper
. If a wrapper is being used, then the data will be
validated for that column using that column’s TermSet
from the
TermSetWrapper
. If there is invalid data, the
row will not be added and the user will be prompted to fix the new data in order to populate the table.
species.add_row(Species_1='Mus musculus', Species_2='Mus musculus')
Validate new columns in a DynamicTable with TermSetWrapper¶
To add a column that is validated using TermSetWrapper
,
wrap the data in the add_column
method as if you were making a new instance of VectorData
.
species.add_column(name='Species_3',
description='...',
data=TermSetWrapper(value=['Ursus arctos horribilis', 'Mus musculus'], termset=terms),)
Total running time of the script: (0 minutes 8.186 seconds)
Note
Go to the end to download the full example code.
AlignedDynamicTable¶
This is a user guide to interacting with AlignedDynamicTable
objects.
Introduction¶
The class AlignedDynamicTable
represents a column-based table
with support for grouping columns by category. AlignedDynamicTable
inherits from DynamicTable
and may contain additional
DynamicTable
objects, one per sub-category. All tables
must align, i.e., they are required to have the same number of rows. Some key features
of AlignedDynamicTable
are:
support custom categories, each of which is a
DynamicTable
stored as part of theAlignedDynamicTable
,support interaction with category tables individually as well as treating the
AlignedDynamicTable
as a single large table, andbecause
AlignedDynamicTable
is itself aDynamicTable
users can:Use
DynamicTableRegion
to reference rows inAlignedDynamicTable
Add custom columns to the
AlignedDynamicTable
, andInteract with
AlignedDynamicTable
as well as the category (sub-tables) it contains in the same fashion as withDynamicTable
When to use (and not use) AlignedDynamicTable?¶
AlignedDynamicTable
is a useful data structure but it is also
fairly complex, consisting of multiple DynamicTable
objects, each of
which is itself a complex type composed of many datasets and attributes. In general, if a simpler
data structure is sufficient, then consider using those instead. For example, consider using instead:
DynamicTable
if a regular table is sufficient.A compound dataset via
Table
if all columns of a table are fixed and fast, column-based access is not critical but fast row-based access is.Multiple, separate tables if using
AlignedDynamicTable
would lead to duplication of data (i.e., de-normalize data), e.g., by having to replicate values across rows of the table.
Use AlignedDynamicTable
when:
When you need to group columns in a
DynamicTable
by categoryNeed to avoid name collisions between columns in a
DynamicTable
and creating compound columns is not an option
Constructing a table¶
To create an AlignedDynamicTable
, call the constructor with:
name
string with the name of the table, anddescription
string to describe the table.
from hdmf.common import AlignedDynamicTable
customer_table = AlignedDynamicTable(
name='customers',
description='an example aligned table',
)
Initializing columns of the primary table¶
The basic behavior of adding data and initializing AlignedDynamicTable
is the same as in DynamicTable
. See the DynamicTable How-To Guide
for details. E.g., using the columns
and colnames
parameters (which are inherited from
DynamicTable
) we can define the columns of the primary table.
All columns must have the same length.
from hdmf.common import VectorData
col1 = VectorData(
name='firstname',
description='Customer first name',
data=['Peter', 'Emma']
)
col2 = VectorData(
name='lastname',
description='Customer last name',
data=['Williams', 'Brown']
)
customer_table = AlignedDynamicTable(
name='customer',
description='an example aligned table',
columns=[col1, col2]
)
Initializing categories¶
By specifying the category_tables
as a list of DynamicTable
objects we can then directly specify the sub-category tables. Optionally, we can also set
the categories
names of the sub-tables as an array of strings to define the ordering of categories.
from hdmf.common import DynamicTable
# create the home_address category table
subcol1 = VectorData(
name='city',
description='city',
data=['Rivercity', 'Mountaincity']
)
subcol2 = VectorData(
name='street',
description='street data',
data=['Amazonstreet', 'Alpinestreet']
)
homeaddress_table = DynamicTable(
name='home_address',
description='home address of the customer',
columns=[subcol1, subcol2]
)
# create the table
customer_table = AlignedDynamicTable(
name='customer',
description='an example aligned table',
columns=[col1, col2],
category_tables=[homeaddress_table, ]
)
# render the table in the online docs
customer_table.to_dataframe()
Adding more data to the table¶
We can add rows, columns, and new categories to the table.
Adding a row¶
To add a row via add_row
we
can either: 1) provide the row data as a single dict to the data
parameter or
2) specify a dict for each category and column as keyword arguments. Additional
optional arguments include id
and enforce_unique_id
.
customer_table.add_row(
firstname='Paul',
lastname='Smith',
home_address={'city': 'Bugcity',
'street': 'Beestree'}
)
# render the table in the online docs
customer_table.to_dataframe()
Adding a column¶
To add a columns we use add_column
.
customer_table.add_column(
name='zipcode',
description='zip code of the city',
data=[11111, 22222, 33333], # specify data for the 3 rows in the table
category='home_address' # use None (or omit) to add columns to the primary table
)
# render the table in the online docs
customer_table.to_dataframe()
Adding a category¶
To add a new DynamicTable
as a category,
we use add_category
.
Note
Only regular DynamicTables
are allowed as category tables. Using
an AlignedDynamicTable
as a category for another AlignedDynamicTable
is currently not supported.
# create a new category DynamicTable for the work address
subcol1 = VectorData(
name='city',
description='city',
data=['Busycity', 'Worktown', 'Labortown']
)
subcol2 = VectorData(
name='street',
description='street data',
data=['Cannery Row', 'Woodwork Avenue', 'Steel Street']
)
subcol3 = VectorData(
name='zipcode',
description='zip code of the city',
data=[33333, 44444, 55555])
workaddress_table = DynamicTable(
name='work_address',
description='home address of the customer',
columns=[subcol1, subcol2, subcol3]
)
# add the category to our AlignedDynamicTable
customer_table.add_category(category=workaddress_table)
# render the table in the online docs
customer_table.to_dataframe()
Note
Because each category is stored as a separate DynamicTable
there are no name collisions between the columns of the home_address
and work_address
tables, so that both can contain matching city
, street
, and zipcode
columns. However,
since a category table is a sub-part of the primary table, categories must not have the
same name as other columns or other categories in the primary table.
Accessing categories, columns, rows, and cells¶
Convert to a pandas DataFrame¶
If we need to access the whole table for analysis, then converting the table
to pandas DataFrame is a convenient option. To ignore the id
columns of all
category tables we can simply set the ignore_category_ids
parameter.
# render the table in the online docs while ignoring the id column of category tables
customer_table.to_dataframe(ignore_category_ids=True)
Accessing categories¶
# Get the list of all categories
_ = customer_table.categories
# Get the DynamicTable object of a particular category
_ = customer_table.get_category(name='home_address')
# Alternatively, we can use normal array slicing to get the category as a pandas DataFrame.
# NOTE: In contrast to the previous call, the table is here converted to a DataFrame.
_ = customer_table['home_address']
Accessing columns¶
We can use the standard Python in
operator to check if a column exists
# To check if a column exists in the primary table we only need to specify the column name
# or alternatively specify the category as None
_ = 'firstname' in customer_table
_ = (None, 'firstname') in customer_table
# To check if a column exists in a category table we need to specify the category
# and column name as a tuple
_ = ('home_address', 'zipcode') in customer_table
We can use standard array slicing to get the VectorData
object of a column.
# To get a column from the primary table we just provide the name.
_ = customer_table['firstname']
# To get a column from a category table we provide both the category name and column name
_ = customer_table['home_address', 'city']
Accessing rows¶
Accessing rows works much like in DynamicTable How-To Guide
# Get a single row by index as a DataFrame
customer_table[1]
# Get a range of rows as a DataFrame
customer_table[0:2]
# Get a list of rows as a DataFrame
customer_table[[0, 2]]
Accessing cells¶
To get a set of cells we need to specify the: 1) category, 2) column, and 3) row index when slicing into the table.
When selecting from the primary table we need to specify None for the category, followed by the column name and the selection.
# Select rows 0:2 from the 'firstname' column in the primary table
customer_table[None, 'firstname', 0:2]
['Peter', 'Emma']
# Select rows 1 from the 'firstname' column in the primary table
customer_table[None, 'firstname', 1]
'Emma'
# Select rows 0 and 2 from the 'firstname' column in the primary table
customer_table[None, 'firstname', [0, 2]]
['Peter', 'Paul']
# Select rows 0:2 from the 'city' column of the 'home_address' category table
customer_table['home_address', 'city', 0:2]
['Rivercity', 'Mountaincity']
Note
Go to the end to download the full example code.
HERD: HDMF External Resources Data Structure¶
This is a user guide to interacting with the
HERD
class. The HERD type
is experimental and is subject to change in future releases. If you use this type,
please provide feedback to the HDMF team so that we can improve the structure and
access of data stored with this type for your use cases.
Introduction¶
The HERD
class provides a way
to organize and map user terms from their data (keys) to multiple entities
from the external resources. A typical use case for external resources is to link data
stored in datasets or attributes to ontologies. For example, you may have a
dataset country
storing locations. Using
HERD
allows us to link the
country names stored in the dataset to an ontology of all countries, enabling
more rigid standardization of the data and facilitating data query and
introspection.
From a user’s perspective, one can think of the
HERD
as a simple table, in which each
row associates a particular key
stored in a particular object
(i.e., Attribute
or Dataset in a file) with a particular entity
(i.e, a term of an online
resource). That is, (object, key)
refer to parts inside a
file and entity
refers to an external resource outside the file, and
HERD
allows us to link the two. To
reduce data redundancy and improve data integrity,
HERD
stores this data internally in a
collection of interlinked tables.
EntityTable
where each row describes anEntity
EntityKeyTable
where each row describes anEntityKey
ObjectTable
where each row describes anObject
ObjectKeyTable
where each row describes anObjectKey
pair identifying which keys are used by which objects.
The HERD
class then provides
convenience functions to simplify interaction with these tables, allowing users
to treat HERD
as a single large table as
much as possible.
Rules to HERD¶
When using the HERD
class, there
are rules to how users store information in the interlinked tables.
Multiple
Key
objects can have the same name. They are disambiguated by theObject
associated with each, meaning we may have keys with the same name in different objects, but for a particular object all keys must be unique.In order to query specific records, the
HERD
class uses ‘(file, object_id, relative_path, field, key)’ as the unique identifier.Do not use the private methods to add into the
KeyTable
,FileTable
,EntityTable
,ObjectTable
,ObjectKeyTable
,EntityKeyTable
individually.URIs are optional, but highly recommended. If not known, an empty string may be used.
An entity ID should be the unique string identifying the entity in the given resource. This may or may not include a string representing the resource and a colon. Use the format provided by the resource. For example, Identifiers.org uses the ID
ncbigene:22353
but the NCBI Gene uses the ID22353
for the same term.In a majority of cases,
Object
objects will have an empty string for ‘field’. TheHERD
class supports compound data_types. In that case, ‘field’ would be the field of the compound data_type that has an external reference.In some cases, the attribute that needs an external reference is not a object with a ‘data_type’. The user must then use the nearest object that has a data type to be used as the parent object. When adding an external resource for an object with a data type, users should not provide an attribute. When adding an external resource for an attribute of an object, users need to provide the name of the attribute.
The user must provide a
File
or anObject
that hasFile
along the parent hierarchy.
Creating an instance of the HERD class¶
from hdmf.common import HERD
from hdmf.common import DynamicTable, VectorData
from hdmf.term_set import TermSet
from hdmf import Container, HERDManager
from hdmf import Data
import numpy as np
import os
# Ignore experimental feature warnings in the tutorial to improve rendering
import warnings
warnings.filterwarnings("ignore", category=UserWarning, message="HERD is experimental*")
try:
dir_path = os.path.dirname(os.path.abspath(__file__))
yaml_file = os.path.join(dir_path, 'example_term_set.yaml')
except NameError:
dir_path = os.path.dirname(os.path.abspath('.'))
yaml_file = os.path.join(dir_path, 'gallery/example_term_set.yaml')
# Class to represent a file
class HERDManagerContainer(Container, HERDManager):
def __init__(self, **kwargs):
kwargs['name'] = 'HERDManagerContainer'
super().__init__(**kwargs)
herd = HERD()
file = HERDManagerContainer(name='file')
Using the add_ref method¶
add_ref
is a wrapper function provided by the
HERD
class that simplifies adding
data. Using add_ref
allows us to
treat new entries similar to adding a new row to a flat table, with
add_ref
taking care of populating
the underlying data structures accordingly.
data = Data(name="species", data=['Homo sapiens', 'Mus musculus'])
data.parent = file
herd.add_ref(
file=file,
container=data,
key='Homo sapiens',
entity_id='NCBI_TAXON:9606',
entity_uri='https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9606'
)
herd.add_ref(
file=file,
container=data,
key='Mus musculus',
entity_id='NCBI_TAXON:10090',
entity_uri='https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=10090'
)
Using the add_ref method with an attribute¶
It is important to keep in mind that when adding and Object
to
the :py:class:~hdmf.common.resources.ObjectTable, the parent object identified by
Object.object_id
must be the closest parent to the target object
(i.e., Object.relative_path
must be the shortest possible path and
as such cannot contain any objects with a data_type
and associated object_id
).
A common example would be with the DynamicTable
class, which holds
VectorData
objects as columns. If we wanted to add an external
reference on a column from a DynamicTable
, then we would use the
column as the object and not the DynamicTable
(Refer to rule 9).
genotypes = DynamicTable(name='genotypes', description='My genotypes')
genotypes.add_column(name='genotype_name', description="Name of genotypes")
genotypes.add_row(id=0, genotype_name='Rorb')
genotypes.parent = file
herd.add_ref(
file=file,
container=genotypes,
attribute='genotype_name',
key='Rorb',
entity_id='MGI:1346434',
entity_uri='http://www.informatics.jax.org/marker/MGI:1343464'
)
# Note: :py:func:`~hdmf.common.resources.HERD.add_ref` internally resolves the object
# to the closest parent, so that ``herd.add_ref(container=genotypes, attribute='genotype_name')`` and
# ``herd.add_ref(container=genotypes.genotype_name, attribute=None)`` will ultimately both use the ``object_id``
# of the ``genotypes.genotype_name`` :py:class:`~hdmf.common.table.VectorData` column and
# not the object_id of the genotypes table.
Using the add_ref method without the file parameter.¶
Even though File
is required to create/add a new reference,
the user can omit the file parameter if the Object
has a file
in its parent hierarchy.
col1 = VectorData(
name='Species_Data',
description='species from NCBI and Ensemble',
data=['Homo sapiens', 'Ursus arctos horribilis'],
)
# Create a DynamicTable with this column and set the table parent to the file object created earlier
species = DynamicTable(name='species', description='My species', columns=[col1])
species.parent = file
herd.add_ref(
container=species,
attribute='Species_Data',
key='Ursus arctos horribilis',
entity_id='NCBI_TAXON:116960',
entity_uri='https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id'
)
Visualize HERD¶
Users can visualize ~hdmf.common.resources.HERD as a flattened table or as separate tables.
# `~hdmf.common.resources.HERD` as a flattened table
herd.to_dataframe()
# The individual interlinked tables:
herd.files.to_dataframe()
herd.objects.to_dataframe()
herd.entities.to_dataframe()
herd.keys.to_dataframe()
herd.object_keys.to_dataframe()
herd.entity_keys.to_dataframe()
Using the get_key method¶
The get_key
method will return a Key
object. In the current version of
HERD
, duplicate keys are allowed; however, each key needs a unique
linking Object. In other words, each combination of (file, container, relative_path, field, key)
can exist only once in HERD
.
# The :py:func:`~hdmf.common.resources.HERD.get_key` method will be able to return the
# :py:class:`~hdmf.common.resources.Key` object if the :py:class:`~hdmf.common.resources.Key` object is unique.
genotype_key_object = herd.get_key(key_name='Rorb')
# If the :py:class:`~hdmf.common.resources.Key` object has a duplicate name, then the user will need
# to provide the unique (file, container, relative_path, field, key) combination.
species_key_object = herd.get_key(file=file,
container=species['Species_Data'],
key_name='Ursus arctos horribilis')
# The :py:func:`~hdmf.common.resources.HERD.get_key` also will check the
# :py:class:`~hdmf.common.resources.Object` for a :py:class:`~hdmf.common.resources.File` along the parent hierarchy
# if the file is not provided as in :py:func:`~hdmf.common.resources.HERD.add_ref`
Using the add_ref method with a key_object¶
Multiple Object
objects can use the same
Key
. To use an existing key when adding
new entries into HERD
, pass the
Key
object instead of the ‘key_name’ to the
add_ref
method. If a ‘key_name’
is used, a new Key
will be created.
herd.add_ref(
file=file,
container=genotypes,
attribute='genotype_name',
key=genotype_key_object,
entity_id='ENSEMBL:ENSG00000198963',
entity_uri='https://uswest.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000198963'
)
Using the get_object_entities¶
The get_object_entities
method
allows the user to retrieve all entities and key information associated with an Object in
the form of a pandas DataFrame.
herd.get_object_entities(file=file,
container=genotypes['genotype_name'],
relative_path='')
Using the get_object_type¶
The get_object_entities
method
allows the user to retrieve all entities and key information associated with an Object in
the form of a pandas DataFrame.
herd.get_object_type(object_type='Data')
Special Case: Using add_ref with compound data¶
In most cases, the field is left as an empty string, but if the dataset or attribute is a compound data_type, then we can use the ‘field’ value to differentiate the different columns of the dataset. For example, if a dataset has a compound data_type with columns/fields ‘x’, ‘y’, and ‘z’, and each column/field is associated with different ontologies, then use field=’x’ to denote that ‘x’ is using the external reference.
# Let's create a new instance of :py:class:`~hdmf.common.resources.HERD`.
herd = HERD()
data = Data(
name='data_name',
data=np.array(
[('Mus musculus', 9, 81.0), ('Homo sapiens', 3, 27.0)],
dtype=[('species', 'U14'), ('age', 'i4'), ('weight', 'f4')]
)
)
data.parent = file
herd.add_ref(
file=file,
container=data,
field='species',
key='Mus musculus',
entity_id='NCBI_TAXON:txid10090',
entity_uri='https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=10090'
)
Using add_ref_termset¶
The add_ref_termset
method allows users to not only validate terms, i.e., keys, but also
add references for an entire datasets, rather than single entries as we saw
prior with add_ref
.
# :py:func:`~hdmf.common.resources.HERD.add_ref_termset` has many optional fields,
# giving the user a range of control when adding references. Let's see an example.
herd = HERD()
terms = TermSet(term_schema_path=yaml_file)
herd.add_ref_termset(file=file,
container=species,
attribute='Species_Data',
key='Ursus arctos horribilis',
termset=terms)
Using add_ref_termset for an entire dataset¶
As mentioned above, add_ref_termset
supports iteratively validating and populating HERD
.
# When populating :py:class:`~hdmf.common.resources.HERD`, users may have some terms
# that are not in the :py:class:`~hdmf.term_set.TermSet`. As a result,
# :py:func:`~hdmf.common.resources.HERD.add_ref_termset` will return all of the missing
# terms in a dictionary. It is up to the user to either add these terms to the
# :py:class:`~hdmf.term_set.TermSet` or remove them from the dataset.
herd = HERD()
terms = TermSet(term_schema_path=yaml_file)
herd.add_ref_termset(file=file,
container=species,
attribute='Species_Data',
termset=terms)
Write HERD¶
HERD
is written as a zip file of
the individual tables written to tsv.
The user provides the path, which contains the name of the file.
herd.to_zip(path='./HERD.zip')
Read HERD¶
Users can read HERD
from the zip file
by providing the path to the file itself.
er_read = HERD.from_zip(path='./HERD.zip')
os.remove('./HERD.zip')
Note
Go to the end to download the full example code.
DynamicTable How-To Guide¶
This is a user guide to interacting with DynamicTable
objects.
Introduction¶
The DynamicTable
class represents a column-based table
to which you can add custom columns. It consists of a name, a description, a list of
row IDs, and a list of columns. Columns are represented by objects of the class
VectorData
, including subclasses of
VectorData
, such as VectorIndex
,
and DynamicTableRegion
.
Constructing a table¶
To create a DynamicTable
, call the constructor for
DynamicTable
with a string name
and string
description
. Specifying the arguments with keywords is recommended.
from hdmf.common import DynamicTable
table = DynamicTable(
name='my_table',
description='an example table',
)
Initializing columns¶
You can create a DynamicTable
with particular
columns by passing a list or tuple of
VectorData
objects for the columns
argument
in the constructor.
If the VectorData
objects contain data values,
then each VectorData
object must contain the
same number of rows as each other. A list of row IDs may be passed into the
DynamicTable
constructor using the id
argument. If IDs are passed in, there should be the same number of rows as
the column data. If IDs are not passed in, then the IDs will be set to
range(len(column_data))
by default.
from hdmf.common import VectorData, VectorIndex
col1 = VectorData(
name='col1',
description='column #1',
data=[1, 2],
)
col2 = VectorData(
name='col2',
description='column #2',
data=['a', 'b'],
)
# this table will have two rows with ids 0 and 1
table = DynamicTable(
name='my table',
description='an example table',
columns=[col1, col2],
)
# this table will have two rows with ids 0 and 1
table_set_ids = DynamicTable(
name='my table',
description='an example table',
columns=[col1, col2],
id=[0, 1],
)
If a list of integers in passed to id
,
DynamicTable
automatically creates
an ElementIdentifiers
object, which is the data type
that stores row IDs. The above command is equivalent to:
from hdmf.common.table import ElementIdentifiers
table_set_ids = DynamicTable(
name='my table',
description='an example table',
columns=[col1, col2],
id=ElementIdentifiers(name='id', data=[0, 1]),
)
Adding rows¶
You can also add rows to a DynamicTable
using
DynamicTable.add_row
.
A keyword argument for every column in the table must be supplied.
You may also supply an optional row ID.
table.add_row(
col1=3,
col2='c',
id=2,
)
Note
If no ID is supplied, the row ID is automatically set to the number of rows of the table prior to adding the new
row. This can result in duplicate IDs. In general, IDs should be unique, but this is not enforced by default.
Pass enforce_unique_id=True to DynamicTable.add_row
to raise an error if the ID is set to an existing ID value.
# this row will have ID 3 by default
table.add_row(
col1=4,
col2='d',
)
Adding columns¶
You can add columns to a DynamicTable
using
DynamicTable.add_column
.
If the table already has rows, then the data
argument must be supplied
as a list of values, one for each row already in the table.
table.add_column(
name='col3',
description='column #3',
data=[True, True, False, True], # specify data for the 4 rows in the table
)
Enumerated (categorical) data¶
EnumData
is a special type of column for storing
an enumerated data type. This way each unique value is stored once, and the data
references those values by index. Using this method is more efficient than storing
a single value many times, and has the advantage of communicating to downstream
tools that the data is categorical in nature.
Warning
EnumData
is currently an experimental
feature and as such should not be used for production use.
from hdmf.common.table import EnumData
import warnings
warnings.filterwarnings(action="ignore", message="EnumData is experimental")
# this column has a length of 5, not 3. the first row has value "aa"
enum_col = EnumData(
name='cell_type',
description='this column holds categorical variables',
data=[0, 1, 2, 1, 0],
elements=['aa', 'bb', 'cc']
)
my_table = DynamicTable(
name='my_table',
description='an example table',
columns=[enum_col],
)
Ragged array columns¶
A table column with a different number of elements for each row is called a
“ragged array column”. To initialize a DynamicTable
with a ragged array column, pass both
the VectorIndex
and its target
VectorData
in for the columns
argument in the constructor. For instance, the following code creates a column
called col1
where the first cell is [‘1a’, ‘1b’, ‘1c’] and the second cell
is [‘2a’].
col1 = VectorData(
name='col1',
description='column #1',
data=['1a', '1b', '1c', '2a'],
)
# the 3 signifies that elements 0 to 3 (exclusive) of the target column belong to the first row
# the 4 signifies that elements 3 to 4 (exclusive) of the target column belong to the second row
col1_ind = VectorIndex(
name='col1_index',
target=col1,
data=[3, 4],
)
table_ragged_col = DynamicTable(
name='my table',
description='an example table',
columns=[col1, col1_ind],
)
Note
By convention, the name of the VectorIndex
should be
the name of the target column with the added suffix “_index”.
VectorIndex.data provides the indices for how to break VectorData.data into cells
You can add an empty ragged array column to an existing
DynamicTable
by specifying index=True
to DynamicTable.add_column
.
This method only works if run before any rows have been added to the table.
new_table = DynamicTable(
name='my_table',
description='an example table',
)
new_table.add_column(
name='col4',
description='column #4',
index=True,
)
If the table already contains data, you must specify the new column values for
the existing rows using the data
argument and you must specify the end indices of
the data
argument that correspond to each row as a list/tuple/array of values for
the index
argument.
table.add_column( # <-- this table already has 4 rows
name='col4',
description='column #4',
data=[1, 0, -1, 0, -1, 1, 1, -1],
index=[3, 4, 6, 8], # specify the end indices (exclusive) of data for each row
)
Alternatively we may also define the ragged array data as a nested list
and use the index
argument to indicate the number of levels. In this case,
the add_column
function will
automatically flatten the data array and compute the corresponding index vectors.
table.add_column( # <-- this table already has 4 rows
name='col5',
description='column #5',
data=[[[1, ], [2, 2]], # row 1
[[3, 3], ], # row 2
[[4, ], [5, 5]], # row 3
[[6, 6], [7, 7, 7]]], # row 4
index=2 # number of levels in the ragged array
)
# Show that the ragged array was converted to flat VectorData with a double VectorIndex
print("Flattened data: %s" % str(table.col5.data))
print("Level 1 index: %s" % str(table.col5_index.data))
print("Level 2 index: %s" % str(table.col5_index_index.data))
Flattened data: [1, 2, 2, 3, 3, 4, 5, 5, 6, 6, 7, 7, 7]
Level 1 index: [1, 3, 5, 6, 8, 10, 13]
Level 2 index: [2, 3, 5, 7]
Referencing rows of other tables¶
You can create a column that references rows of another table by adding a
DynamicTableRegion
object as a column of your
DynamicTable
. This is analogous to
a foreign key in a relational database.
from hdmf.common.table import DynamicTableRegion
dtr_col = DynamicTableRegion(
name='table1_ref',
description='references rows of earlier table',
data=[0, 1, 0, 0], # refers to row indices of the 'table' variable
table=table
)
data_col = VectorData(
name='col2',
description='column #2',
data=['a', 'a', 'a', 'b'],
)
table2 = DynamicTable(
name='my_table',
description='an example table',
columns=[dtr_col, data_col],
)
Here, the data
of dtr_col
maps to rows of table
(0-indexed).
Note
The data
values of DynamicTableRegion
map to the row
index, not the row ID, though if you are using default IDs, these values will be the
same.
Reference more than one row of another table with a
DynamicTableRegion
indexed by a
VectorIndex
.
indexed_dtr_col = DynamicTableRegion(
name='table1_ref2',
description='references multiple rows of earlier table',
data=[0, 0, 1, 1, 0, 0, 1],
table=table
)
# row 0 refers to rows [0, 0], row 1 refers to rows [1], row 2 refers to rows [1, 0], row 3 refers to rows [0, 1] of
# the "table" variable
dtr_idx = VectorIndex(
name='table1_ref2_index',
target=indexed_dtr_col,
data=[2, 3, 5, 7],
)
table3 = DynamicTable(
name='my_table',
description='an example table',
columns=[dtr_idx, indexed_dtr_col],
)
Setting the target table of a DynamicTableRegion column of a DynamicTable¶
A subclass of DynamicTable might have a pre-defined DynamicTableRegion column.
To write this column correctly, the “table” attribute of the column must be set so
that users know to what table the row index values reference. Because the target
table could be any table, the “table” attribute must be set explicitly. There are three
ways to do so. First, you can use the target_tables
argument of the
DynamicTable constructor as shown below. This argument
is a dictionary mapping the name of the DynamicTableRegion column to
the target table. Secondly, the target table can be set after the DynamicTable
has been initialized using my_table.my_column.table = other_table
. Finally,
you can create the DynamicTableRegion column and pass the table
attribute to DynamicTableRegion.__init__ and then pass the column to
DynamicTable.__init__ using the columns argument. However, this approach
is not recommended for columns defined in the schema, because it is up to
the user to ensure that the column is created in accordance with the schema.
class SubTable(DynamicTable):
__columns__ = (
{'name': 'dtr', 'description': 'required region', 'required': True, 'table': True},
)
referenced_table = DynamicTable(
name='referenced_table',
description='an example table',
)
sub_table = SubTable(
name='sub_table',
description='an example table',
target_tables={'dtr': referenced_table},
)
# now the target table of the DynamicTableRegion column 'dtr' is set to `referenced_table`
Creating an expandable table¶
When using the default HDF5 backend, each column of these tables is an HDF5 Dataset,
which by default are set in size. This means that once a file is written, it is not
possible to add a new row. If you want to be able to save this file, load it, and add
more rows to the table, you will need to set this up when you create the
DynamicTable
. You do this by wrapping the data with
H5DataIO
and the argument maxshape=(None, )
.
from hdmf.backends.hdf5.h5_utils import H5DataIO
col1 = VectorData(
name='expandable_col1',
description='column #1',
data=H5DataIO(data=[1, 2], maxshape=(None,)),
)
col2 = VectorData(
name='expandable_col2',
description='column #2',
data=H5DataIO(data=['a', 'b'], maxshape=(None,)),
)
# don't forget to wrap the row IDs too!
ids = ElementIdentifiers(
name='id',
data=H5DataIO(data=[0, 1], maxshape=(None,)),
)
expandable_table = DynamicTable(
name='expandable_table',
description='an example table that can be expanded after being saved to a file',
columns=[col1, col2],
id=ids,
)
Now you can write the file, read it back, and run expandable_table.add_row()
.
In this example, we are setting maxshape
to (None,)
, which means this is a
1-dimensional matrix that can expand indefinitely along its single dimension. You
could also use an integer in place of None
. For instance, maxshape=(8,)
would
allow the column to grow up to a length of 8. Whichever maxshape
you choose,
it should be the same for all VectorData
and
ElementIdentifiers
objects in the
DynamicTable
, since they must always be the same
length. The default ElementIdentifiers
automatically
generated when you pass a list of integers to the id
argument of the
DynamicTable
constructor is not expandable, so do not
forget to create a ElementIdentifiers
object, and wrap
that data as well. If any of the columns are indexed, the data
argument of
VectorIndex
will also need to be wrapped with
H5DataIO
.
Converting the table to a pandas DataFrame
¶
pandas is a popular data analysis tool, especially for working with tabular data.
You can convert your DynamicTable
to a
DataFrame
using
DynamicTable.to_dataframe
.
Accessing the table as a DataFrame
provides you with powerful,
standard methods for indexing, selecting, and querying tabular data from pandas.
This is the recommended method of reading data from your table. See also the pandas indexing documentation.
Printing a DynamicTable
as a DataFrame
or displaying the DataFrame
in Jupyter shows a more intuitive
tabular representation of the data than printing the
DynamicTable
object.
df = table.to_dataframe()
Note
Changes to the DataFrame
will not be saved in the DynamicTable
.
Converting the table from a pandas DataFrame
¶
If your data is already in a DataFrame
, you can convert the
DataFrame
to a DynamicTable
using the class method
DynamicTable.from_dataframe
.
table_from_df = DynamicTable.from_dataframe(
name='my_table',
df=df,
)
Accessing elements¶
To access an element in the i-th row in the column with name “col_name” in a
DynamicTable
, use square brackets notation:
table[i, col_name]
. You can also use a tuple of row index and column
name within the square brackets.
table[0, 'col1'] # returns 1
table[(0, 'col1')] # returns 1
1
If the column is a ragged array, instead of a single value being returned, a list of values for that element is returned.
table[0, 'col4'] # returns [1, 0, -1]
[1, 0, -1]
Standard Python and numpy slicing can be used for the row index.
import numpy as np
table[:2, 'col1'] # get a list of elements from the first two rows at column 'col1'
table[0:3:2, 'col1'] # get a list of elements from rows 0 to 3 (exclusive) in steps of 2 at column 'col1'
table[3::-1, 'col1'] # get a list of elements from rows 3 to 0 in reverse order at column 'col1'
# the following are equivalent to table[0:3:2, 'col1']
table[slice(0, 3, 2), 'col1']
table[np.s_[0:3:2], 'col1']
table[[0, 2], 'col1']
table[np.array([0, 2]), 'col1']
[1, 3]
If the column is a ragged array, instead of a list of row values being returned, a list of list elements for the selected rows is returned.
table[:2, 'col4'] # returns [[1, 0, -1], [0]]
[[1, 0, -1], [0]]
Note
You cannot supply a list/tuple for the column name. For this
kind of access, first convert the DynamicTable
to a DataFrame
.
Accessing columns¶
To access all the values in a column, use square brackets with a colon for the
row index: table[:, col_name]
. If the column is a ragged array, a list of
list elements is returned.
table[:, 'col1'] # returns [1, 2, 3, 4]
table[:, 'col4'] # returns [[1, 0, -1], [0], [-1, 1], [1, -1]]
[[1, 0, -1], [0], [-1, 1], [1, -1]]
Accessing rows¶
To access the i-th row in a DynamicTable
, returned
as a DataFrame
, use the syntax table[i]
. Standard Python
and numpy slicing can be used for the row index.
table[0] # get the 0th row of the table as a DataFrame
table[:2] # get the first two rows
table[0:3:2] # get rows 0 to 3 (exclusive) in steps of 2
table[3::-1] # get rows 3 to 0 in reverse order
# the following are equivalent to table[0:3:2]
table[slice(0, 3, 2)]
table[np.s_[0:3:2]]
table[[0, 2]]
table[np.array([0, 2])]
Note
The syntax table[i]
returns the i-th row, NOT the row with ID of i.
Iterating over rows¶
To iterate over the rows of a DynamicTable
,
first convert the DynamicTable
to a
DataFrame
using
DynamicTable.to_dataframe
.
For more information on iterating over a DataFrame
,
see https://pandas.pydata.org/pandas-docs/stable/user_guide/basics.html#iteration
df = table.to_dataframe()
for row in df.itertuples():
print(row)
Pandas(Index=0, col1=1, col2='a', col3=True, col4=[1, 0, -1], col5=[[1], [2, 2]])
Pandas(Index=1, col1=2, col2='b', col3=True, col4=[0], col5=[[3, 3]])
Pandas(Index=2, col1=3, col2='c', col3=False, col4=[-1, 1], col5=[[4], [5, 5]])
Pandas(Index=3, col1=4, col2='d', col3=True, col4=[1, -1], col5=[[6, 6], [7, 7, 7]])
Accessing the column data types¶
To access the VectorData
or
VectorIndex
object representing a column, you
can use three different methods. Use the column name in square brackets, e.g.,
table[col_name]
, use the
DynamicTable.get
method, or
use the column name as an attribute, e.g., table.col_name
.
table['col1']
table.get('col1') # equivalent to table['col1'] except this returns None if 'col1' is not found
table.get('col1', default=0) # you can change the default return value
table.col1
<hdmf.common.table.VectorData object at 0x7f7c64c1f070>
Note
Using the column name as an attribute does NOT work if the column name is
the same as a non-column name attribute or method of the
DynamicTable
class,
e.g., name
, description
, object_id
, parent
, modified
.
If the column is a ragged array, then the methods above will return the
VectorIndex
associated with the ragged array.
table['col4']
table.get('col4') # equivalent to table['col4'] except this returns None if 'col4' is not found
table.get('col4', default=0) # you can change the default return value
<hdmf.common.table.VectorIndex object at 0x7f7c3ad29e20>
Note
The attribute syntax table.col_name
currently returns the VectorData
instead of the VectorIndex
for a ragged array. This is a known
issue and will be fixed in a future version of HDMF.
Accessing elements from column data types¶
Standard Python and numpy slicing can be used on the
VectorData
or
VectorIndex
objects to access elements from
column data. If the column is a ragged array, then instead of a list of row
values being returned, a list of list elements for the selected rows is returned.
table['col1'][0] # get the 0th element from column 'col1'
table['col1'][:2] # get a list of the 0th and 1st elements
table['col1'][0:3:2] # get a list of the 0th to 3rd (exclusive) elements in steps of 2
table['col1'][3::-1] # get a list of the 3rd to 0th elements in reverse order
# the following are equivalent to table['col1'][0:3:2]
table['col1'][slice(0, 3, 2)]
table['col1'][np.s_[0:3:2]]
table['col1'][[0, 2]]
table['col1'][np.array([0, 2])]
# this slicing and indexing works for ragged array columns as well
table['col4'][:2] # get a list of the 0th and 1st list elements
[[1, 0, -1], [0]]
Note
The syntax table[col_name][i]
is equivalent to table[i, col_name]
.
Multi-dimensional columns¶
A column can be represented as a multi-dimensional rectangular array or a list of lists, each containing the same number of elements.
col5 = VectorData(
name='col5',
description='column #5',
data=[['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i']],
)
Ragged multi-dimensional columns¶
Each element within a column can be an n-dimensional array or list or lists. This is true for ragged array columns as well.
col6 = VectorData(
name='col6',
description='column #6',
data=[['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i']],
)
col6_ind = VectorIndex(
name='col6_index',
target=col6,
data=[2, 3],
)
Nested ragged array columns¶
In the example above, the ragged array column above has two rows. The first row has two elements,
where each element has 3 sub-elements. This can be thought of as a 2x3 array.
The second row has one element with 3 sub-elements, or a 1x3 array. This
works only if the data for col5
is a rectangular array, that is, each row
element contains the same number of sub-elements. If each row element does
not contain the same number of sub-elements, then a nested ragged array
approach must be used instead.
A VectorIndex
object can index another
VectorIndex
object. For example, the first row
of a table might be a 2x3 array, the second row might be a 3x2 array, and the
third row might be a 1x1 array. This cannot be represented by a singly
indexed column, but can be represented by a nested ragged array column.
col7 = VectorData(
name='col7',
description='column #6',
data=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm'],
)
col7_ind = VectorIndex(
name='col7_index',
target=col7,
data=[3, 6, 8, 10, 12, 13],
)
col7_ind_ind = VectorIndex(
name='col7_index_index',
target=col7_ind,
data=[2, 5, 6],
)
# all indices must be added to the table
table_double_ragged_col = DynamicTable(
name='my table',
description='an example table',
columns=[col7, col7_ind, col7_ind_ind],
)
Access the first row using the same syntax as before, except now a list of lists is returned. You can then index the resulting list of lists to access the individual elements.
table_double_ragged_col[0, 'col7'] # returns [['a', 'b', 'c'], ['d', 'e', 'f']]
table_double_ragged_col['col7'][0] # same as line above
[['a', 'b', 'c'], ['d', 'e', 'f']]
Accessing the column named ‘col7’ using square bracket notation will return
the top-level VectorIndex
for the column.
Accessing the column named ‘col7’ using dot notation will return the
VectorData
object
table_double_ragged_col['col7'] # returns col7_ind_ind
table_double_ragged_col.col7 # returns the col7 VectorData object
<hdmf.common.table.VectorData object at 0x7f7c3abc5a30>
Accessing data from a DynamicTable
that contain references to rows of other DynamicTable
objects¶
By default, when
DynamicTable.__getitem__
and DynamicTable.get
are supplied
with an int, list of ints, numpy array, or a slice representing rows to return,
a pandas DataFrame
is returned. If the
DynamicTable
contains a
DynamicTableRegion
column that references rows
of other DynamicTable
objects, then by default, the
DynamicTable.__getitem__
and DynamicTable.get
methods will
return row indices of the referenced table, and not the contents of the referenced
table. To return the contents of the referenced table as a nested
DataFrame
containing only the referenced rows, use
DynamicTable.get
with index=False
.
# create a new table of users
users_table = DynamicTable(
name='users',
description='a table containing data/metadata about users, one user per row',
)
# add simple columns to this table
users_table.add_column(
name='first_name',
description='the first name of the user',
)
users_table.add_column(
name='last_name',
description='the last name of the user',
)
# create a new table of addresses to reference
addresses_table = DynamicTable(
name='addresses',
description='a table containing data/metadata about addresses, one address per row',
)
addresses_table.add_column(
name='street_address',
description='the street number and address',
)
addresses_table.add_column(
name='city',
description='the city of the address',
)
# add rows to the addresses table
addresses_table.add_row(
street_address='123 Main St',
city='Springfield'
)
addresses_table.add_row(
street_address='45 British Way',
city='London'
)
# add a column to the users table that references rows of the addresses table
users_table.add_column(
name='address',
description='the address of the user',
table=addresses_table
)
# add rows to the users table
users_table.add_row(
first_name='Grace',
last_name='Hopper',
address=0 # <-- row index of the address table
)
users_table.add_row(
first_name='Alan',
last_name='Turing',
address=1 # <-- row index of the address table
)
# get the first row of the users table
users_table.get(0)
# get the first row of the users table with a nested dataframe
users_table.get(0, index=False)
# get the first two rows of the users table
users_table.get([0, 1])
# get the first two rows of the users table with nested dataframes
# of the addresses table in the address column
users_table.get([0, 1], index=False)
Note
You can also get rows from a DynamicTable
as a list of
lists where the i-th nested list contains the values for the i-th row. This method is
generally not recommended.
Displaying the contents of a table with references to another table¶
Earlier, we converted a DynamicTable
to a
DataFrame
using
DynamicTable.to_dataframe
and printed the DataFrame
to see its contents.
This also works when the DynamicTable
contains a column
that references another table. However, the entries for this column for each row
will be printed as a nested DataFrame
. This can be difficult to
read, so to view only the row indices of the referenced table, pass
index=True
to
DynamicTable.to_dataframe
.
users_df = users_table.to_dataframe(index=True)
users_df
You can then access the referenced table using the table
attribute of the
column object. This is useful when reading a table from a file where you may not have
a variable to access the referenced table.
First, use DynamicTable.__getitem__
(square brackets notation) to get the
DynamicTableRegion
object representing the column.
Then access its table
attribute to get the addresses table and convert the table
to a DataFrame
.
address_column = users_table['address']
read_addresses_table = address_column.table
addresses_df = read_addresses_table.to_dataframe()
Get the addresses corresponding to the rows of the users table:
address_indices = users_df['address'] # pandas Series of row indices into the addresses table
addresses_df.iloc[address_indices] # use .iloc because these are row indices not ID values
Note
The indices returned by users_df['address']
are row indices and not
the ID values of the table. However, if you are using default IDs, these
values will be the same.
Creating custom DynamicTable subclasses¶
TODO
Defining __columns__
¶
TODO
Introduction¶
HDMF provides a high-level Python API for specifying, reading, writing and manipulating hierarchical object data. This section provides a broad overview of the software architecture of HDMF (see Section Software Architecture) and its functionality.
Software Architecture¶
The main goal of HDMF is to enable users and developers to efficiently interact with the hierarchical object data. The following figures provide an overview of the high-level architecture of HDMF and functionality of the various components.

Overview of the high-level software architecture of HDMF (click to enlarge).¶

We choose a modular design for HDMF to enable flexibility and separate the various levels of standardizing hierarchical data (click to enlarge).¶
Main Concepts¶

Overview of the main concepts/classes in HDMF and their location in the overall software architecture (click to enlarge).¶
Container¶
In memory objects
Interface for (most) applications
Similar to a table row
HDMF does not provide these. They are left for standards developers to define how users interact with data.
There are two Container base classes:
Main Module:
hdmf.container
Builder¶
Intermediary objects for I/O
Interface for I/O
Backend readers and writers must return and accept these
There are different kinds of builders for different base types:
GroupBuilder
- represents a collection of objectsDatasetBuilder
- represents dataLinkBuilder
- represents soft-linksRegionBuilder
- represents a slice into data (Subclass ofDatasetBuilder
)
Main Module:
hdmf.build.builders
Spec¶
Interact with format specifications
Data structures to specify data types and what said types consist of
Python representation for YAML specifications
Interface for writing extensions or custom specification
There are several main specification classes:
AttributeSpec
- specification for metadataGroupSpec
- specification for a collection of objects (i.e. subgroups, datasets, link)DatasetSpec
- specification for dataset (like and n-dimensional array). Specifies data type, dimensions, etc.LinkSpec
- specification for link (like a POSIX soft link)RefSpec
- specification for references (References are like links, but stored as data)DtypeSpec
- specification for compound data types. Used to build complex data type specification, e.g., to define tables (used only inDatasetSpec
and correspondinglyDatasetSpec
)
Main Modules:
hdmf.spec
Note
A data_type
defines a reusable type in a format specification that can be
referenced and used elsewhere in other specifications. The specification of
the standard is basically a collection of data_types
,
data_type_inc
is used to include an existing type anddata_type_def
is used to define a new type
i.e, if both keys are defined then we create a new type that uses/inherits an existing type as a base.
ObjectMapper¶
Maintains the mapping between Container attributes and Spec components
Provides a way of converting between Container and Builder, while leaving standards developers with the flexibility of presenting data to users in a user-friendly manner, while storing data in an efficient manner
ObjectMappers are constructed using a Spec
Ideally, one ObjectMapper for each data type
Things an ObjectMapper should do:
Main Module:
hdmf.build.objectmapper

Relationship between Container, Builder, ObjectMapper, and Spec¶
Additional Concepts¶
Namespace, NamespaceCatalog, NamespaceBuilder¶
Namespace
A namespace for specifications
Necessary for making standards extensions and standard core specification
Contains basic info about who created extensions
NamespaceCatalog
– A class for managing namespacesNamespaceBuilder
– A utility for building extensions
TypeMap¶
Map between data types, Container classes (i.e. a Python class object) and corresponding ObjectMapper classes
Constructed from a NamespaceCatalog
Things a TypeMap does:
Given a data_type, return the associated Container class
Given a Container class, return the associated ObjectMapper
HDMF has one of these classes:
the base class (i.e.
TypeMap
)
TypeMaps can be merged, which is useful when combining extensions
BuildManager¶
Constructed from a TypeMap
HDMF only has one of these:
hdmf.build.manager.BuildManager

Overview of BuildManager (and TypeMap) (click to enlarge).¶
HDMFIO¶
Abstract base class for I/O
HDMFIO
has two key abstract methods:write_builder
– given a builder, write data to storage formatread_builder
– given a handle to storage format, return builder representation
Constructed with a BuildManager
Extend this for creating a new I/O backend
HDMF has one concrete form of this:
HDF5IO
- reading and writing HDF5

Citing HDMF¶
BibTeX entry¶
If you use HDMF in your research, please use the following citation:
@INPROCEEDINGS{9005648,
author={A. J. {Tritt} and O. {Rübel} and B. {Dichter} and R. {Ly} and D. {Kang} and E. F. {Chang} and L. M. {Frank} and K. {Bouchard}},
booktitle={2019 IEEE International Conference on Big Data (Big Data)},
title={HDMF: Hierarchical Data Modeling Framework for Modern Science Data Standards},
year={2019},
volume={},
number={},
pages={165-179},
doi={10.1109/BigData47090.2019.9005648}}
Using RRID¶
RRID: (Hierarchical Data Modeling Framework, RRID:SCR_021303)
Using duecredit¶
Citations can be generated using duecredit. To install duecredit, run pip install duecredit
.
You can obtain a list of citations for your Python script, e.g., yourscript.py
, using:
cd /path/to/your/module
python -m duecredit yourscript.py
Alternatively, you can set the environment variable DUECREDIT_ENABLE=yes
DUECREDIT-ENABLE=yes python yourscript.py
Citations will be saved in a hidden file (.duecredit.p
) in the current directory. You can then use the duecredit
command line tool to export the citations to different formats. For example, you can display your citations in
BibTeX format using:
duecredit summary --format=bibtex
For more information on using duecredit, please consult its homepage.
API Documentation¶
hdmf.common package¶
Subpackages¶
hdmf.common.io package¶
Submodules¶
hdmf.common.io.alignedtable module¶
- class hdmf.common.io.alignedtable.AlignedDynamicTableMap(spec)¶
Bases:
DynamicTableMap
Customize the mapping for AlignedDynamicTable
Create a map from AbstractContainer attributes to specifications
- Parameters:
spec (
DatasetSpec
orGroupSpec
) – The specification for mapping objects to builders
- constructor_args = {'name': <function ObjectMapper.get_container_name>}¶
- obj_attrs = {'colnames': <function DynamicTableMap.attr_columns>}¶
hdmf.common.io.multi module¶
- class hdmf.common.io.multi.SimpleMultiContainerMap(spec)¶
Bases:
ObjectMapper
Create a map from AbstractContainer attributes to specifications
- Parameters:
spec (
DatasetSpec
orGroupSpec
) – The specification for mapping objects to builders
- containers_attr(container, manager)¶
- containers_carg(builder, manager)¶
- datas_attr(container, manager)¶
- constructor_args = {'containers': <function SimpleMultiContainerMap.containers_carg>, 'name': <function ObjectMapper.get_container_name>}¶
- obj_attrs = {'containers': <function SimpleMultiContainerMap.containers_attr>, 'datas': <function SimpleMultiContainerMap.datas_attr>}¶
hdmf.common.io.resources module¶
- class hdmf.common.io.resources.HERDMap(spec)¶
Bases:
ObjectMapper
Create a map from AbstractContainer attributes to specifications
- Parameters:
spec (
DatasetSpec
orGroupSpec
) – The specification for mapping objects to builders
- construct_helper(name, parent_builder, table_cls, manager)¶
Create a new instance of table_cls with data from parent_builder[name].
The DatasetBuilder for name is associated with data_type Data and container class Data, but users should use the more specific table_cls for these datasets.
- keys(builder, manager)¶
- files(builder, manager)¶
- entities(builder, manager)¶
- objects(builder, manager)¶
- object_keys(builder, manager)¶
- entity_keys(builder, manager)¶
- constructor_args = {'entities': <function HERDMap.entities>, 'entity_keys': <function HERDMap.entity_keys>, 'files': <function HERDMap.files>, 'keys': <function HERDMap.keys>, 'name': <function ObjectMapper.get_container_name>, 'object_keys': <function HERDMap.object_keys>, 'objects': <function HERDMap.objects>}¶
- obj_attrs = {}¶
hdmf.common.io.table module¶
- class hdmf.common.io.table.DynamicTableMap(spec)¶
Bases:
ObjectMapper
Create a map from AbstractContainer attributes to specifications
- Parameters:
spec (
DatasetSpec
orGroupSpec
) – The specification for mapping objects to builders
- attr_columns(container, manager)¶
- get_attr_value(spec, container, manager)¶
Get the value of the attribute corresponding to this spec from the given container
- Parameters:
spec (
Spec
) – the spec to get the attribute value forcontainer (
DynamicTable
) – the container to get the attribute value frommanager (
BuildManager
) – the BuildManager used for managing this build
- constructor_args = {'name': <function ObjectMapper.get_container_name>}¶
- obj_attrs = {'colnames': <function DynamicTableMap.attr_columns>}¶
- class hdmf.common.io.table.DynamicTableGenerator(*args, **kwargs)¶
Bases:
CustomClassGenerator
- classmethod apply_generator_to_field(field_spec, bases, type_map)¶
Return True if this is a DynamicTable and the field spec is a column.
- classmethod process_field_spec(classdict, docval_args, parent_cls, attr_name, not_inherited_fields, type_map, spec)¶
Add __columns__ to the classdict and update the docval args for the field spec with the given attribute name. :param classdict: The dict to update with __columns__. :param docval_args: The list of docval arguments. :param parent_cls: The parent class. :param attr_name: The attribute name of the field spec for the container class to generate. :param not_inherited_fields: Dictionary of fields not inherited from the parent class. :param type_map: The type map to use. :param spec: The spec for the container class to generate.
- classmethod post_process(classdict, bases, docval_args, spec)¶
Convert classdict[‘__columns__’] to tuple. :param classdict: The class dictionary. :param bases: The list of base classes. :param docval_args: The dict of docval arguments. :param spec: The spec for the container class to generate.
Module contents¶
Submodules¶
hdmf.common.alignedtable module¶
Collection of Container classes for interacting with aligned and hierarchical dynamic tables
- class hdmf.common.alignedtable.AlignedDynamicTable(name, description, id=None, columns=None, colnames=None, target_tables=None, category_tables=None, categories=None)¶
Bases:
DynamicTable
DynamicTable container that supports storing a collection of subtables. Each sub-table is a DynamicTable itself that is aligned with the main table by row index. I.e., all DynamicTables stored in this group MUST have the same number of rows. This type effectively defines a 2-level table in which the main data is stored in the main table implemented by this type and additional columns of the table are grouped into categories, with each category being’ represented by a separate DynamicTable stored within the group.
- NOTE: To remain compatible with DynamicTable, the attribute colnames represents only the
columns of the main table (not including the category tables). To get the full list of column names, use the get_colnames() function instead.
- Parameters:
name (
str
) – the name of this tabledescription (
str
) – a description of what is in this tableid (
ndarray
orlist
ortuple
orDataset
orArray
orStrDataset
orHDMFDataset
orAbstractDataChunkIterator
orDataIO
orElementIdentifiers
) – the identifiers for this tablecolnames (
ndarray
orlist
ortuple
orDataset
orArray
orStrDataset
orHDMFDataset
orAbstractDataChunkIterator
) – the ordered names of the columns in this table. columns must also be provided.target_tables (
dict
) – dict mapping DynamicTableRegion column name to the table that the DTR points to. The column is added to the table if it is not already present (i.e., when it is optional).category_tables (
list
) – List of DynamicTables to be added to the container. NOTE - Only regular DynamicTables are allowed. Using AlignedDynamicTable as a category for AlignedDynamicTable is currently not supported.categories (
ndarray
orlist
ortuple
orDataset
orArray
orStrDataset
orHDMFDataset
orAbstractDataChunkIterator
) – List of names with the ordering of category tables
- property category_tables¶
List of DynamicTables to be added to the container. NOTE - Only regular DynamicTables are allowed. Using AlignedDynamicTable as a category for AlignedDynamicTable is currently not supported.
- property categories¶
Get the list of names the categories
Short-hand for list(self.category_tables.keys())
- Raises:
KeyError if the given name is not in self.category_tables
- add_category(category)¶
Add a new DynamicTable to the AlignedDynamicTable to create a new category in the table.
NOTE: The table must align with (i.e, have the same number of rows as) the main data table (and other category tables). I.e., if the AlignedDynamicTable is already populated with data then we have to populate the new category with the corresponding data before adding it.
- raises:
ValueError is raised if the input table does not have the same number of rows as the main table. ValueError is raised if the table is an AlignedDynamicTable instead of regular DynamicTable.
- Parameters:
category (
DynamicTable
) – Add a new DynamicTable category
- add_column(name, description, data=[], table=False, index=False, enum=False, col_cls=<class 'hdmf.common.table.VectorData'>, check_ragged=True, category=None)¶
Add a column to the table
- raises:
KeyError if the category does not exist
- Parameters:
name (
str
) – the name of this VectorDatadescription (
str
) – a description for this columndata (
ndarray
orlist
ortuple
orDataset
orArray
orStrDataset
orHDMFDataset
orAbstractDataChunkIterator
orDataIO
) – a dataset where the first dimension is a concatenation of multiple vectorstable (
bool
orDynamicTable
) – whether or not this is a table region or the table the region applies toindex (
bool
orVectorIndex
orndarray
orlist
ortuple
orDataset
orArray
orStrDataset
orHDMFDataset
orAbstractDataChunkIterator
orint
) –False
(default): do not generate a VectorIndexTrue
: generate one empty VectorIndexVectorIndex
: Use the supplied VectorIndexarray-like of ints: Create a VectorIndex and use these values as the data
int
: Recursively create n VectorIndex objects for a multi-ragged array
enum (
bool
orndarray
orlist
ortuple
orDataset
orArray
orStrDataset
orHDMFDataset
orAbstractDataChunkIterator
) – whether or not this column contains data from a fixed set of elementscol_cls (
type
) – class to use to represent the column data. If table=True, this field is ignored and a DynamicTableRegion object is used. If enum=True, this field is ignored and a EnumData object is used.check_ragged (
bool
) – whether or not to check for ragged arrays when adding data to the table. Set to False to avoid checking every element if performance issues occur.category (
str
) – The category the column should be added to
- add_row(data=None, id=None, enforce_unique_id=False)¶
We can either provide the row data as a single dict or by specifying a dict for each category
- get_colnames(include_category_tables=False, ignore_category_ids=False)¶
Get the full list of names of columns for this table
- returns:
List of tuples (str, str) where the first string is the name of the DynamicTable that contains the column and the second string is the name of the column. If include_category_tables is False, then a list of column names is returned.
- to_dataframe(ignore_category_ids=False)¶
Convert the collection of tables to a single pandas DataFrame
- Parameters:
ignore_category_ids (
bool
) – Ignore id columns of sub-category tables
- __getitem__(item)¶
Called to implement standard array slicing syntax.
Same as
self.get(item)
. Seeget
for details.
- get(item, **kwargs)¶
Access elements (rows, columns, category tables etc.) from the table. Instead of calling this function directly, the class also implements standard array slicing syntax via
__getitem__
(which calls this function). For example, instead of callingself.get(item=slice(2,5))
we may use the often more convenient form ofself[2:5]
instead.- Parameters:
item – Selection defining the items of interest. This may be either a:
- int, list, array, sliceReturn one or multiple row of the table as a pandas.DataFrame. For example:
self[0]
: Select the first row of the tableself[[0,3]]
: Select the first and fourth row of the tableself[1:4]
: Select the rows with index 1,2,3 from the table
- stringReturn a column from the main table or a category table. For example:
self['column']
: Return the column from the main table.self['my_category']
: Returns a DataFrame of themy_category
category table. This is a shorthand forself.get_category('my_category').to_dataframe()
.
tuple: Get a column, row, or cell from a particular category table. The tuple is expected to consist of the following elements:
category
: string with the name of the category. To select from the main table useself.name
orNone
.column
: string with the name of the column, androw
: integer index of the row.
The tuple itself then may take the following forms:
- Select a single column from a table via:
self[category, column]
- Select a single full row of a given category table via:
self[row, category]
(recommended, for consistency with DynamicTable)self[category, row]
- Select a single cell via:
self[row, (category, column)]
(recommended, for consistency with DynamicTable)self[row, category, column]
self[category, column, row]
- Returns:
Depending on the type of selection the function returns a:
pandas.DataFrame: when retrieving a row or category table
array : when retrieving a single column
single value : when retrieving a single cell. The data type and shape will depend on the data type and shape of the cell/column.
- has_foreign_columns(ignore_category_tables=False)¶
Does the table contain DynamicTableRegion columns
- returns:
True if the table or any of the category tables contains a DynamicTableRegion column, else False
- Parameters:
ignore_category_tables (
bool
) – Ignore the category tables and only check in the main table columns
- get_foreign_columns(ignore_category_tables=False)¶
- Determine the names of all columns that link to another DynamicTable, i.e.,
find all DynamicTableRegion type columns. Similar to a foreign key in a database, a DynamicTableRegion column references elements in another table.
- returns:
List of tuples (str, str) where the first string is the name of the category table (or None if the column is in the main table) and the second string is the column name.
- Parameters:
ignore_category_tables (
bool
) – Ignore the category tables and only check in the main table columns
- get_linked_tables(other_tables=None, ignore_category_tables=False)¶
- Get a list of the full list of all tables that are being linked to directly or indirectly
from this table via foreign DynamicTableColumns included in this table or in any table that can be reached through DynamicTableRegion columns
- Returns: List of dicts with the following keys:
‘source_table’ : The source table containing the DynamicTableRegion column
‘source_column’ : The relevant DynamicTableRegion column in the ‘source_table’
‘target_table’ : The target DynamicTable; same as source_column.table.
- Parameters:
- data_type = 'AlignedDynamicTable'¶
- namespace = 'hdmf-common'¶
hdmf.common.hierarchicaltable module¶
Module providing additional functionality for dealing with hierarchically nested tables, i.e., tables containing DynamicTableRegion references.
- hdmf.common.hierarchicaltable.to_hierarchical_dataframe(dynamic_table)¶
Create a hierarchical pandas.DataFrame that represents all data from a collection of linked DynamicTables.
LIMITATIONS: Currently this function only supports DynamicTables with a single DynamicTableRegion column. If a table has more than one DynamicTableRegion column then the function will expand only the first DynamicTableRegion column found for each table. Any additional DynamicTableRegion columns will remain nested.
NOTE: Some useful functions for further processing of the generated DataFrame include:
pandas.DataFrame.reset_index to turn the data from the pandas.MultiIndex into columns
drop_id_columns
to remove all ‘id’ columnsflatten_column_index
to flatten the column index
- Parameters:
dynamic_table (
DynamicTable
) – DynamicTable object to be converted to a hierarchical pandas.Dataframe- Returns:
Hierarchical pandas.DataFrame with usually a pandas.MultiIndex on both the index and columns.
- Return type:
- hdmf.common.hierarchicaltable.drop_id_columns(dataframe, inplace=False)¶
Drop all columns named ‘id’ from the table.
In case a column name is a tuple the function will drop any column for which the inner-most name is ‘id’. The ‘id’ columns of DynamicTable is in many cases not necessary for analysis or display. This function allow us to easily filter all those columns.
- raises TypeError:
In case that dataframe parameter is not a pandas.Dataframe.
- hdmf.common.hierarchicaltable.flatten_column_index(dataframe, max_levels=None, inplace=False)¶
Flatten the column index of a pandas DataFrame.
The functions changes the dataframe.columns from a pandas.MultiIndex to a normal Index, with each column usually being identified by a tuple of strings. This function is typically used in conjunction with DataFrames generated by
to_hierarchical_dataframe
- raises ValueError:
In case the num_levels is not >0
- raises TypeError:
In case that dataframe parameter is not a pandas.Dataframe.
- Parameters:
dataframe (
DataFrame
) – Pandas dataframe to update (usually generated by the to_hierarchical_dataframe function)max_levels (
int
orinteger
) – Maximum number of levels to use in the resulting column Index. NOTE: When limiting the number of levels the function simply removes levels from the beginning. As such, removing levels may result in columns with duplicate names.Value must be >0.inplace (
bool
) – Update the dataframe inplace or return a modified copy
- Returns:
pandas.DataFrame with a regular pandas.Index columns rather and a pandas.MultiIndex
- Return type:
hdmf.common.multi module¶
- class hdmf.common.multi.SimpleMultiContainer(name, containers=None)¶
Bases:
MultiContainerInterface
- Parameters:
- property containers¶
a dictionary containing the Container or Data in this SimpleMultiContainer
- __getitem__(name=None)¶
Get a Container from this SimpleMultiContainer
- add_container(containers)¶
Add one or multiple Container or Data objects to this SimpleMultiContainer
- data_type = 'SimpleMultiContainer'¶
- get_container(name=None)¶
Get a Container from this SimpleMultiContainer
- namespace = 'hdmf-common'¶
hdmf.common.resources module¶
- class hdmf.common.resources.KeyTable(name='keys', data=[])¶
Bases:
Table
A table for storing keys used to reference external resources.
- Parameters:
name (
str
) – the name of this tabledata (
ndarray
orlist
ortuple
orDataset
orArray
orStrDataset
orHDMFDataset
orAbstractDataChunkIterator
orDataIO
) – the data in this table
- class hdmf.common.resources.Key(key, table=None, idx=None)¶
Bases:
Row
A Row class for representing rows in the KeyTable.
- Parameters:
- todict()¶
- class hdmf.common.resources.EntityTable(name='entities', data=[])¶
Bases:
Table
A table for storing the external resources a key refers to.
- Parameters:
name (
str
) – the name of this tabledata (
ndarray
orlist
ortuple
orDataset
orArray
orStrDataset
orHDMFDataset
orAbstractDataChunkIterator
orDataIO
) – the data in this table
- class hdmf.common.resources.Entity(entity_id, entity_uri, table=None, idx=None)¶
Bases:
Row
A Row class for representing rows in the EntityTable.
- Parameters:
- todict()¶
- class hdmf.common.resources.FileTable(name='files', data=[])¶
Bases:
Table
A table for storing file ids used in external resources.
- Parameters:
name (
str
) – the name of this tabledata (
ndarray
orlist
ortuple
orDataset
orArray
orStrDataset
orHDMFDataset
orAbstractDataChunkIterator
orDataIO
) – the data in this table
- class hdmf.common.resources.File(file_object_id, table=None, idx=None)¶
Bases:
Row
A Row class for representing rows in the FileTable.
- Parameters:
- todict()¶
- class hdmf.common.resources.ObjectTable(name='objects', data=[])¶
Bases:
Table
A table for storing objects (i.e. Containers) that contain keys that refer to external resources.
- Parameters:
name (
str
) – the name of this tabledata (
ndarray
orlist
ortuple
orDataset
orArray
orStrDataset
orHDMFDataset
orAbstractDataChunkIterator
orDataIO
) – the data in this table
- add_row(files_idx, object_id, object_type, relative_path, field)¶
- Parameters:
files_idx (
int
) – The row idx for the file_object_id in FileTable containing the object.object_id (
str
) – The object ID for the Container/Data.object_type (
str
) – The type of the object. This is also the parent in relative_path.relative_path (
str
) – (‘The relative_path of the attribute of the object that uses ‘, ‘an external resource reference key. Use an empty string if not applicable.’)field (
str
) – The field of the compound data type using an external resource. Use an empty string if not applicable.
- class hdmf.common.resources.Object(files_idx, object_id, object_type, relative_path, field, table=None, idx=None)¶
Bases:
Row
A Row class for representing rows in the ObjectTable.
- Parameters:
files_idx (
int
) – The row idx for the file_object_id in FileTable containing the object.object_id (
str
) – The object ID for the Container/Data.object_type (
str
) – The type of the object. This is also the parent in relative_path.relative_path (
str
) – (‘The relative_path of the attribute of the object that uses ‘, ‘an external resource reference key. Use an empty string if not applicable.’)field (
str
) – The field of the compound data type using an external resource. Use an empty string if not applicable.table (
Table
) – Noneidx (
int
) – None
- todict()¶
- class hdmf.common.resources.ObjectKeyTable(name='object_keys', data=[])¶
Bases:
Table
A table for identifying which keys are used by which objects for referring to external resources.
- Parameters:
name (
str
) – the name of this tabledata (
ndarray
orlist
ortuple
orDataset
orArray
orStrDataset
orHDMFDataset
orAbstractDataChunkIterator
orDataIO
) – the data in this table
- class hdmf.common.resources.EntityKeyTable(name='entity_keys', data=[])¶
Bases:
Table
A table for identifying which entities are used by which keys for referring to external resources.
- Parameters:
name (
str
) – the name of this tabledata (
ndarray
orlist
ortuple
orDataset
orArray
orStrDataset
orHDMFDataset
orAbstractDataChunkIterator
orDataIO
) – the data in this table
- class hdmf.common.resources.EntityKey(entities_idx, keys_idx, table=None, idx=None)¶
Bases:
Row
A Row class for representing rows in the EntityKeyTable.
- Parameters:
- todict()¶
- class hdmf.common.resources.ObjectKey(objects_idx, keys_idx, table=None, idx=None)¶
Bases:
Row
A Row class for representing rows in the ObjectKeyTable.
- Parameters:
- todict()¶
- class hdmf.common.resources.HERD(keys=None, files=None, entities=None, objects=None, object_keys=None, entity_keys=None, type_map=None)¶
Bases:
Container
HDMF External Resources Data Structure. A table for mapping user terms (i.e. keys) to resource entities.
- Parameters:
keys (
KeyTable
) – The table storing user keys for referencing resources.files (
FileTable
) – The table for storing file ids used in external resources.entities (
EntityTable
) – The table storing entity information.objects (
ObjectTable
) – The table storing object information.object_keys (
ObjectKeyTable
) – The table storing object-key relationships.entity_keys (
EntityKeyTable
) – The table storing entity-key relationships.type_map (
TypeMap
) – The type map. If None is provided, the HDMF-common type map will be used.
- property keys¶
The table storing user keys for referencing resources.
- property files¶
The table for storing file ids used in external resources.
- property entities¶
The table storing entity information.
- property objects¶
The table storing object information.
- property object_keys¶
The table storing object-key relationships.
- property entity_keys¶
The table storing entity-key relationships.
- static assert_external_resources_equal(left, right, check_dtype=True)¶
Compare that the keys, resources, entities, objects, and object_keys tables match
- Parameters:
left – HERD object to compare with right
right – HERD object to compare with left
check_dtype – Enforce strict checking of dtypes. Dtypes may be different for example for ids, where depending on how the data was saved ids may change from int64 to int32. (Default: True)
- Returns:
The function returns True if all values match. If mismatches are found, AssertionError will be raised.
- Raises:
AssertionError – Raised if any differences are found. The function collects all differences into a single error so that the assertion will indicate all found differences.
- add_ref_container(root_container)¶
- Method to search through the root_container for all instances of TermSet.
Currently, only datasets are supported. By using a TermSet, the data comes validated and can use the permissible values within the set to populate HERD.
- Parameters:
root_container (
HERDManager
) – The root container or file containing objects with a TermSet.
- add_ref_termset(termset, file=None, container=None, attribute=None, field='', key=None)¶
- This method allows users to take advantage of using the TermSet class to provide the entity information
for add_ref, while also validating the data. This method supports adding a single key or an entire dataset to the HERD tables. For both cases, the term, i.e., key, will be validated against the permissible values in the TermSet. If valid, it will proceed to call add_ref. Otherwise, the method will return a dict of missing terms (terms not found in the TermSet).
- Parameters:
termset (
TermSet
) – The TermSet to be used if the container/attribute does not have one.file (
HERDManager
) – The file associated with the container.container (
str
orAbstractContainer
) – The Container/Data object that uses the key or the object_id for the Container/Data object that uses the key.attribute (
str
) – The attribute of the container for the external reference.field (
str
) – The field of the compound data type using an external resource.key (
str
orKey
) – The name of the key or the Key object from the KeyTable for the key to add a resource for.
- add_ref(entity_id, container=None, attribute=None, field='', key=None, entity_uri=None, file=None)¶
Add information about an external reference used in this file.
It is possible to use the same name of the key to refer to different resources so long as the name of the key is not used within the same object, relative_path, and field combination. This method does not support such functionality by default.
- Parameters:
entity_id (
str
) – The identifier for the entity at the resource.container (
str
orAbstractContainer
) – The Container/Data object that uses the key or the object_id for the Container/Data object that uses the key.attribute (
str
) – The attribute of the container for the external reference.field (
str
) – The field of the compound data type using an external resource.key (
str
orKey
) – The name of the key or the Key object from the KeyTable for the key to add a resource for.entity_uri (
str
) – The URI for the identifier at the resource.file (
HERDManager
) – The file associated with the container.
- get_key(key_name, file=None, container=None, relative_path='', field='')¶
Return a Key.
If container, relative_path, and field are provided, the Key that corresponds to the given name of the key for the given container, relative_path, and field is returned.
If there are multiple matches, a list of all matching keys will be returned.
- Parameters:
key_name (
str
) – The name of the Key to get.file (
HERDManager
) – The file associated with the container.container (
str
orAbstractContainer
) – The Container/Data object that uses the key or the object id for the Container/Data object that uses the key.relative_path (
str
) – (‘The relative_path of the attribute of the object that uses ‘, ‘an external resource reference key. Use an empty string if not applicable.’)field (
str
) – The field of the compound data type using an external resource.
- get_object_type(object_type, relative_path='', field='', all_instances=False)¶
Get all entities/resources associated with an object_type.
- Parameters:
object_type (
str
) – The type of the object. This is also the parent in relative_path.relative_path (
str
) – (‘The relative_path of the attribute of the object that uses ‘, ‘an external resource reference key. Use an empty string if not applicable.’)field (
str
) – The field of the compound data type using an external resource.all_instances (
bool
) – (‘The bool to return a dataframe with all instances of the object_type.’, ‘If True, relative_path and field inputs will be ignored.’)
- get_object_entities(container, file=None, attribute=None, relative_path='', field='')¶
Get all entities/resources associated with an object.
- Parameters:
container (
str
orAbstractContainer
) – The Container/data object that is linked to resources/entities.file (
HERDManager
) – The file.attribute (
str
) – The attribute of the container for the external reference.relative_path (
str
) – (‘The relative_path of the attribute of the object that uses ‘, ‘an external resource reference key. Use an empty string if not applicable.’)field (
str
) – The field of the compound data type using an external resource.
- to_dataframe(use_categories=False)¶
- Convert the data from the keys, resources, entities, objects, and object_keys tables
to a single joint dataframe. I.e., here data is being denormalized, e.g., keys that are used across multiple entities or objects will duplicated across the corresponding rows.
Returns:
DataFrame
with all data merged into a single, flat, denormalized table.
- to_zip(path)¶
Write the tables in HERD to zipped tsv files.
- Parameters:
path (
str
) – The path to the zip file.
- classmethod get_zip_directory(path)¶
Return the directory of the file given.
- Parameters:
path (
str
) – The path to the zip file.
- classmethod from_zip(path)¶
Method to read in zipped tsv files to populate HERD.
- Parameters:
path (
str
) – The path to the zip file.
- data_type = 'HERD'¶
- namespace = 'hdmf-experimental'¶
hdmf.common.sparse module¶
- class hdmf.common.sparse.CSRMatrix(data, indices=None, indptr=None, shape=None, name='csr_matrix')¶
Bases:
Container
- Parameters:
data (
csr_matrix
orndarray
orlist
ortuple
orDataset
orArray
orStrDataset
orHDMFDataset
orAbstractDataChunkIterator
) – the data to use for this CSRMatrix or CSR data array.If passing CSR data array, indices, indptr, and shape must also be providedindices (
ndarray
orlist
ortuple
orDataset
orArray
orStrDataset
orHDMFDataset
orAbstractDataChunkIterator
) – CSR index arrayindptr (
ndarray
orlist
ortuple
orDataset
orArray
orStrDataset
orHDMFDataset
orAbstractDataChunkIterator
) – CSR index pointer arrayshape (
ndarray
orlist
ortuple
orDataset
orArray
orStrDataset
orHDMFDataset
orAbstractDataChunkIterator
) – the shape of the matrixname (
str
) – the name to use for this when storing
- to_spmat()¶
- data_type = 'CSRMatrix'¶
- namespace = 'hdmf-common'¶
hdmf.common.table module¶
Collection of Container classes for interacting with data types related to the storage and use of dynamic data tables as part of the hdmf-common schema
- class hdmf.common.table.VectorData(name, description, data=[])¶
Bases:
Data
A n-dimensional dataset representing a column of a DynamicTable. If used without an accompanying VectorIndex, first dimension is along the rows of the DynamicTable and each step along the first dimension is a cell of the larger table. VectorData can also be used to represent a ragged array if paired with a VectorIndex. This allows for storing arrays of varying length in a single cell of the DynamicTable by indexing into this VectorData. The first vector is at VectorData[0:VectorIndex(0)+1]. The second vector is at VectorData[VectorIndex(0)+1:VectorIndex(1)+1], and so on.
- Parameters:
name (
str
) – the name of this VectorDatadescription (
str
) – a description for this columndata (
ndarray
orlist
ortuple
orDataset
orArray
orStrDataset
orHDMFDataset
orAbstractDataChunkIterator
orDataIO
) – a dataset where the first dimension is a concatenation of multiple vectors
- property description¶
a description for this column
- add_row(val)¶
Append a data value to this VectorData column
- Parameters:
val (None) – the value to add to this column
- get(key, **kwargs)¶
Retrieve elements from this VectorData
- Parameters:
key – Selection of the elements
kwargs – Ignored
- extend(ar, **kwargs)¶
Add all elements of the iterable arg to the end of this VectorData.
Each subclass of VectorData should have its own extend method to ensure functionality and efficiency.
- Parameters:
arg – The iterable to add to the end of this VectorData
- data_type = 'VectorData'¶
- namespace = 'hdmf-common'¶
- class hdmf.common.table.VectorIndex(name, data, target)¶
Bases:
VectorData
When paired with a VectorData, this allows for storing arrays of varying length in a single cell of the DynamicTable by indexing into this VectorData. The first vector is at VectorData[0:VectorIndex(0)+1]. The second vector is at VectorData[VectorIndex(0)+1:VectorIndex(1)+1], and so on.
- Parameters:
name (
str
) – the name of this VectorIndexdata (
ndarray
orlist
ortuple
orDataset
orArray
orStrDataset
orHDMFDataset
orAbstractDataChunkIterator
orDataIO
) – a 1D dataset containing indexes that apply to VectorData objecttarget (
VectorData
) – the target dataset that this index applies to
- property target¶
the target dataset that this index applies to
- add_vector(arg, **kwargs)¶
Add the given data value to the target VectorData and append the corresponding index to this VectorIndex :param arg: The data value to be added to self.target
- add_row(arg, **kwargs)¶
Convenience function. Same as
add_vector
- __getitem__(arg)¶
Select elements in this VectorIndex and retrieve the corresponding data from the self.target VectorData
- Parameters:
arg – slice or integer index indicating the elements we want to select in this VectorIndex
- Returns:
Scalar or list of values retrieved
- get(arg, **kwargs)¶
Select elements in this VectorIndex and retrieve the corresponding data from the self.target VectorData
- Parameters:
arg – slice or integer index indicating the elements we want to select in this VectorIndex
kwargs – any additional arguments to get method of the self.target VectorData
- Returns:
Scalar or list of values retrieved
- data_type = 'VectorIndex'¶
- namespace = 'hdmf-common'¶
- class hdmf.common.table.ElementIdentifiers(name, data=[])¶
Bases:
Data
Data container with a list of unique identifiers for values within a dataset, e.g. rows of a DynamicTable.
- Parameters:
name (
str
) – the name of this ElementIdentifiersdata (
ndarray
orlist
ortuple
orDataset
orArray
orStrDataset
orHDMFDataset
orAbstractDataChunkIterator
orDataIO
) – a 1D dataset containing integer identifiers
- data_type = 'ElementIdentifiers'¶
- namespace = 'hdmf-common'¶
- class hdmf.common.table.DynamicTable(name, description, id=None, columns=None, colnames=None, target_tables=None)¶
Bases:
Container
A column-based table. Columns are defined by the argument columns. This argument must be a list/tuple of
VectorData
andVectorIndex
objects or a list/tuple of dicts containing the keysname
anddescription
that provide the name and description of each column in the table. Additionally, the keysindex
,table
,enum
can be used for specifying additional structure to the table columns. Setting the keyindex
toTrue
can be used to indicate that theVectorData
column will store a ragged array (i.e. will be accompanied with aVectorIndex
). Setting the keytable
toTrue
can be used to indicate that the column will store regions to another DynamicTable. Setting the keyenum
toTrue
can be used to indicate that the column data will come from a fixed set of values.Columns in DynamicTable subclasses can be statically defined by specifying the class attribute __columns__, rather than specifying them at runtime at the instance level. This is useful for defining a table structure that will get reused. The requirements for __columns__ are the same as the requirements described above for specifying table columns with the columns argument to the DynamicTable constructor.
- Parameters:
name (
str
) – the name of this tabledescription (
str
) – a description of what is in this tableid (
ndarray
orlist
ortuple
orDataset
orArray
orStrDataset
orHDMFDataset
orAbstractDataChunkIterator
orDataIO
orElementIdentifiers
) – the identifiers for this tablecolnames (
ndarray
orlist
ortuple
orDataset
orArray
orStrDataset
orHDMFDataset
orAbstractDataChunkIterator
) – the ordered names of the columns in this table. columns must also be provided.target_tables (
dict
) – dict mapping DynamicTableRegion column name to the table that the DTR points to. The column is added to the table if it is not already present (i.e., when it is optional).
- property description¶
a description of what is in this table
- property id¶
the identifiers for this table
- property colnames¶
the ordered names of the columns in this table. columns must also be provided.
- property columns¶
the columns in this table
- add_row(data=None, id=None, enforce_unique_id=False, check_ragged=True)¶
Add a row to the table. If id is not provided, it will auto-increment.
- Parameters:
data (
dict
) – the data to put in this rowid (
int
) – the ID for the rowenforce_unique_id (
bool
) – enforce that the id in the table must be uniquecheck_ragged (
bool
) – whether or not to check for ragged arrays when adding data to the table. Set to False to avoid checking every element if performance issues occur.
- add_column(name, description, data=[], table=False, index=False, enum=False, col_cls=<class 'hdmf.common.table.VectorData'>, check_ragged=True)¶
Add a column to this table.
If data is provided, it must contain the same number of rows as the current state of the table.
Extra keyword arguments will be passed to the constructor of the column class (“col_cls”).
- raises ValueError:
if the column has already been added to the table
- Parameters:
name (
str
) – the name of this VectorDatadescription (
str
) – a description for this columndata (
ndarray
orlist
ortuple
orDataset
orArray
orStrDataset
orHDMFDataset
orAbstractDataChunkIterator
orDataIO
) – a dataset where the first dimension is a concatenation of multiple vectorstable (
bool
orDynamicTable
) – whether or not this is a table region or the table the region applies toindex (
bool
orVectorIndex
orndarray
orlist
ortuple
orDataset
orArray
orStrDataset
orHDMFDataset
orAbstractDataChunkIterator
orint
) –False
(default): do not generate a VectorIndexTrue
: generate one empty VectorIndexVectorIndex
: Use the supplied VectorIndexarray-like of ints: Create a VectorIndex and use these values as the data
int
: Recursively create n VectorIndex objects for a multi-ragged array
enum (
bool
orndarray
orlist
ortuple
orDataset
orArray
orStrDataset
orHDMFDataset
orAbstractDataChunkIterator
) – whether or not this column contains data from a fixed set of elementscol_cls (
type
) – class to use to represent the column data. If table=True, this field is ignored and a DynamicTableRegion object is used. If enum=True, this field is ignored and a EnumData object is used.check_ragged (
bool
) – whether or not to check for ragged arrays when adding data to the table. Set to False to avoid checking every element if performance issues occur.
- create_region(name, region, description)¶
Create a DynamicTableRegion selecting a region (i.e., rows) in this DynamicTable.
- raises:
IndexError if the provided region contains invalid indices
- __getitem__(key)¶
- get(key, default=None, df=True, index=True, **kwargs)¶
Select a subset from the table.
If the table includes a DynamicTableRegion column, then by default, the index/indices of the DynamicTableRegion will be returned. If
df=True
andindex=False
, then the returned pandas DataFrame will contain a nested DataFrame in each row of the DynamicTableRegion column. Ifdf=False
andindex=True
, then a list of lists will be returned where the list containing the DynamicTableRegion column contains the indices of the DynamicTableRegion. Note that in this case, the DynamicTable referenced by the DynamicTableRegion can be accessed through thetable
attribute of the DynamicTableRegion object.df=False
andindex=False
is not yet supported.- Parameters:
key –
Key defining which elements of the table to select. This may be one of the following:
string with the name of the column to select
a tuple consisting of (int, str) where the int selects the row and the string identifies the column to select by name
int, list of ints, array, or slice selecting a set of full rows in the table. If an int is used, then scalars are returned for each column that has a single value. If a list, array, or slice is used and df=False, then lists are returned for each column, even if the list, array, or slice resolves to a single row.
- Returns:
If key is a string, then return the VectorData object representing the column with the string name
If key is a tuple of (int, str), then return the scalar value of the selected cell
If key is an int, list, np.ndarray, or slice, then return pandas.DataFrame or lists consisting of one or more rows
- Raises:
KeyError
- get_foreign_columns()¶
Determine the names of all columns that link to another DynamicTable, i.e., find all DynamicTableRegion type columns. Similar to a foreign key in a database, a DynamicTableRegion column references elements in another table.
- Returns:
List of strings with the column names
- has_foreign_columns()¶
Does the table contain DynamicTableRegion columns
- Returns:
True if the table contains a DynamicTableRegion column, else False
- get_linked_tables(other_tables=None)¶
- Get a list of the full list of all tables that are being linked to directly or indirectly
from this table via foreign DynamicTableColumns included in this table or in any table that can be reached through DynamicTableRegion columns
- Returns: List of NamedTuple objects with:
‘source_table’ : The source table containing the DynamicTableRegion column
‘source_column’ : The relevant DynamicTableRegion column in the ‘source_table’
‘target_table’ : The target DynamicTable; same as source_column.table.
- to_dataframe(exclude=None, index=False)¶
Produce a pandas DataFrame containing this table’s data.
If this table contains a DynamicTableRegion, by default,
If exclude is None, this is equivalent to table.get(slice(None, None, None), index=False).
- classmethod from_dataframe(df, name, index_column=None, table_description='', columns=None)¶
Construct an instance of DynamicTable (or a subclass) from a pandas DataFrame.
The columns of the resulting table are defined by the columns of the dataframe and the index by the dataframe’s index (make sure it has a name!) or by a column whose name is supplied to the index_column parameter. We recommend that you supply columns - a list/tuple of dictionaries containing the name and description of the column- to help others understand the contents of your table. See
DynamicTable
for more details on columns.- Parameters:
df (
DataFrame
) – source DataFramename (
str
) – the name of this tableindex_column (
str
) – if provided, this column will become the table’s indextable_description (
str
) – a description of what is in the resulting tablecolumns (
list
ortuple
) – a list/tuple of dictionaries specifying columns in the table
- copy()¶
Return a copy of this DynamicTable. This is useful for linking.
- data_type = 'DynamicTable'¶
- namespace = 'hdmf-common'¶
- class hdmf.common.table.DynamicTableRegion(name, data, description, table=None)¶
Bases:
VectorData
DynamicTableRegion provides a link from one table to an index or region of another. The table attribute is another DynamicTable, indicating which table is referenced. The data is int(s) indicating the row(s) (0-indexed) of the target array. DynamicTableRegion`s can be used to associate multiple rows with the same meta-data without data duplication. They can also be used to create hierarchical relationships between multiple `DynamicTable`s. `DynamicTableRegion objects may be paired with a VectorIndex object to create ragged references, so a single cell of a DynamicTable can reference many rows of another DynamicTable.
- Parameters:
name (
str
) – the name of this VectorDatadata (
ndarray
orlist
ortuple
orDataset
orArray
orStrDataset
orHDMFDataset
orAbstractDataChunkIterator
orDataIO
) – a dataset where the first dimension is a concatenation of multiple vectorsdescription (
str
) – a description of what this region representstable (
DynamicTable
) – the DynamicTable this region applies to
- property table¶
The DynamicTable this DynamicTableRegion is pointing to
- __getitem__(arg)¶
- get(arg, index=False, df=True, **kwargs)¶
Subset the DynamicTableRegion
- Parameters:
arg –
Key defining which elements of the table to select. This may be one of the following:
string with the name of the column to select
a tuple consisting of (int, str) where the int selects the row and the string identifies the column to select by name
int, list of ints, array, or slice selecting a set of full rows in the table. If an int is used, then scalars are returned for each column that has a single value. If a list, array, or slice is used and df=False, then lists are returned for each column, even if the list, array, or slice resolves to a single row.
index – Boolean indicating whether to return indices of the DTR (default False)
df – Boolean indicating whether to return the result as a pandas DataFrame (default True)
- Returns:
Result from self.table[…] with the appropriate selection based on the rows selected by this DynamicTableRegion
- to_dataframe(**kwargs)¶
Convert the whole DynamicTableRegion to a pandas dataframe.
Keyword arguments are passed through to the to_dataframe method of DynamicTable that is being referenced (i.e., self.table). This allows specification of the ‘exclude’ parameter and any other parameters of DynamicTable.to_dataframe.
- property shape¶
Define the shape, i.e., (num_rows, num_columns) of the selected table region :return: Shape tuple with two integers indicating the number of rows and number of columns
- data_type = 'DynamicTableRegion'¶
- namespace = 'hdmf-common'¶
- class hdmf.common.table.EnumData(name, description, data=[], elements=[])¶
Bases:
VectorData
A n-dimensional dataset that can contain elements from fixed set of elements.
- Parameters:
name (
str
) – the name of this columndescription (
str
) – a description for this columndata (
ndarray
orlist
ortuple
orDataset
orArray
orStrDataset
orHDMFDataset
orAbstractDataChunkIterator
orDataIO
) – integers that index into elements for the value of each rowelements (
ndarray
orlist
ortuple
orDataset
orArray
orStrDataset
orHDMFDataset
orAbstractDataChunkIterator
orDataIO
orVectorData
) – lookup values for each integer indata
- property elements¶
lookup values for each integer in
data
- data_type = 'EnumData'¶
- namespace = 'hdmf-experimental'¶
- __getitem__(arg)¶
- get(arg, index=False, join=False, **kwargs)¶
Return elements elements for the given argument.
- add_row(val, index=False)¶
Append a data value to this EnumData column
If an element is provided for val (i.e. index is False), the correct index value will be determined. Otherwise, val will be added as provided.
- Parameters:
val (None) – the value to add to this column
index (
bool
) – whether or not the value being added is an index
Module contents¶
This package will contain functions, classes, and objects for reading and writing data in according to the HDMF-common specification
- hdmf.common.load_type_config(config_path, type_map=None)¶
- This method will either load the default config or the config provided by the path.
NOTE: This config is global and shared across all type maps.
- hdmf.common.get_loaded_type_config(type_map=None)¶
This method returns the entire config file.
- Parameters:
type_map (
TypeMap
) – The TypeMap.
- hdmf.common.unload_type_config(type_map=None)¶
Unload the configuration file.
- Parameters:
type_map (
TypeMap
) – The TypeMap.
- hdmf.common.register_class(data_type, namespace='hdmf-common', container_cls=None)¶
- Register an Container class to use for reading and writing a data_type from a specification
If container_cls is not specified, returns a decorator for registering an Container subclass as the class for data_type in namespace.
- hdmf.common.register_map(container_cls, mapper_cls=None)¶
- Register an ObjectMapper to use for a Container class type
If mapper_cls is not specified, returns a decorator for registering an ObjectMapper class as the mapper for container_cls. If mapper_cls specified, register the class as the mapper for container_cls
- hdmf.common.load_namespaces(namespace_path)¶
Load namespaces from file
- hdmf.common.available_namespaces()¶
- hdmf.common.get_class(data_type, namespace, post_init_method=None)¶
Get the class object of the Container subclass corresponding to a given neurdata_type.
- hdmf.common.get_type_map(extensions=None)¶
- Get a BuildManager to use for I/O using the given extensions. If no extensions are provided,
return a BuildManager that uses the core namespace
- hdmf.common.get_manager(extensions=None)¶
- Get a BuildManager to use for I/O using the given extensions. If no extensions are provided,
return a BuildManager that uses the core namespace
- hdmf.common.validate(io, namespace='hdmf-common', experimental=False)¶
Validate an file against a namespace
- hdmf.common.get_hdf5io(path=None, mode='r', manager=None, comm=None, file=None, driver=None, aws_region=None, herd_path=None)¶
A convenience method for getting an HDF5IO object using an HDMF-common build manager if none is provided.
- Parameters:
mode (
str
) – the mode to open the HDF5 file with, one of (“w”, “r”, “r+”, “a”, “w-”, “x”). See h5py.File for more details.manager (
TypeMap
orBuildManager
) – the BuildManager or a TypeMap to construct a BuildManager to use for I/Ocomm (
Intracomm
) – the MPI communicator to use for parallel I/Ofile (
File
orS3File
orRemFile
) – a pre-existing h5py.File, S3File, or RemFile objectdriver (
str
) – driver for h5py to use when opening HDF5 fileaws_region (
str
) – If driver is ros3, then specify the aws region of the url.herd_path (
str
) – The path to read/write the HERD file
hdmf.container module¶
- class hdmf.container.HERDManager¶
Bases:
object
This class manages whether to set/attach an instance of HERD to the subclass.
- link_resources(herd)¶
Method to attach an instance of HERD in order to auto-add terms/references to data.
- Parameters:
herd (
HERD
) – The external resources to be used for the container.
- get_linked_resources()¶
- class hdmf.container.AbstractContainer(name)¶
Bases:
object
- Parameters:
name (
str
) – the name of this container
- property data_type¶
Return the spec data type associated with this container.
- classmethod get_fields_conf()¶
- property read_io¶
The
HDMFIO
object used for reading the container.This property will typically be None if this Container is not a root Container (i.e., if parent is not None). Use get_read_io instead if you want to retrieve the
HDMFIO
object used for reading from the parent container.
- get_read_io()¶
Get the io object used to read this container.
If self.read_io is None, this function will iterate through the parents and return the first io object found on a parent container
- Returns:
The
HDMFIO
object used to read this container. Returns None in case no io object is found, e.g., in case this container has not been read from file.
- property name¶
The name of this Container
- get_ancestor(data_type=None)¶
Traverse parent hierarchy and return first instance of the specified data_type
- Parameters:
data_type (
str
) – the data_type to search for
- all_children()¶
Get a list of all child objects and their child objects recursively.
If the object has an object_id, the object will be added to “ret” to be returned. If that object has children, they will be added to the “stack” in order to be: 1) Checked to see if has an object_id, if so then add to “ret” 2) Have children that will also be checked
- property all_objects¶
Get a LabelledDict that indexed all child objects and their children by object ID.
- get_ancestors()¶
- property fields¶
Subclasses use this class attribute to add properties to autogenerate. fields allows for lists and for dicts with the keys {‘name’, ‘child’, ‘required_name’, ‘doc’, ‘settable’}. 1. name: The name of the field property 2. child: A boolean value to set the parent/child relationship between the field property and the container. 3. required_name: The name the field property must have such that name matches required_name. 4. doc: Documentation of the field property 5. settable: If true, a setter function is created so that the field can be changed after creation.
- property object_id¶
- generate_new_id(recurse=True)¶
Changes the object ID of this Container and all of its children to a new UUID string.
- Parameters:
recurse (
bool
) – whether or not to change the object ID of this container’s children
- property modified¶
- set_modified(modified=True)¶
- Parameters:
modified (
bool
) – whether or not this Container has been modified
- property children¶
- add_child(child=None)¶
- Parameters:
child (
Container
) – the child Container for this Container
- classmethod type_hierarchy()¶
- property container_source¶
The source of this Container
- property parent¶
The parent Container of this Container
- reset_parent()¶
Reset the parent of this Container to None and remove the Container from the children of its parent.
Use with caution. This can result in orphaned containers and broken links.
- class hdmf.container.Container(name)¶
Bases:
AbstractContainer
A container that can contain other containers and has special functionality for printing.
- Parameters:
name (
str
) – the name of this container
- set_data_io(dataset_name: str, data_io_class: Type[DataIO], data_io_kwargs: dict | None = None, **kwargs)¶
Apply DataIO object to a dataset field of the Container.
- Parameters:
dataset_name (str) – Name of dataset to wrap in DataIO
data_io_class (Type[DataIO]) – Class to use for DataIO, e.g. H5DataIO or ZarrDataIO
data_io_kwargs (dict) – keyword arguments passed to the constructor of the DataIO class.
**kwargs – DEPRECATED. Use data_io_kwargs instead. kwargs are passed to the constructor of the DataIO class.
- data_type = 'Container'¶
- namespace = 'hdmf-common'¶
- class hdmf.container.Data(name, data)¶
Bases:
AbstractContainer
A class for representing dataset containers
- Parameters:
name (
str
) – the name of this containerdata (
str
orint
orfloat
orbytes
orbool
orndarray
orlist
ortuple
orDataset
orArray
orStrDataset
orHDMFDataset
orAbstractDataChunkIterator
orDataIO
) – the source of the data
- property data¶
- property shape¶
Get the shape of the data represented by this container :return: Shape tuple :rtype: tuple of ints
- set_dataio(dataio)¶
Apply DataIO object to the data held by this Data object
- Parameters:
dataio (
DataIO
) – the DataIO to apply to the data held by this Data
- set_data_io(data_io_class: Type[DataIO], data_io_kwargs: dict) None ¶
Apply DataIO object to the data held by this Data object.
- transform(func)¶
Transform data from the current underlying state.
This function can be used to permanently load data from disk, or convert to a different representation, such as a torch.Tensor
- Parameters:
func (
function
) – a function to transform data
- __getitem__(args)¶
- get(args)¶
- append(arg)¶
- extend(arg)¶
The extend_data method adds all the elements of the iterable arg to the end of the data of this Data container.
- Parameters:
arg – The iterable to add to the end of this VectorData
- data_type = 'Data'¶
- namespace = 'hdmf-common'¶
- class hdmf.container.DataRegion(name, data)¶
Bases:
Data
- Parameters:
name (
str
) – the name of this containerdata (
str
orint
orfloat
orbytes
orbool
orndarray
orlist
ortuple
orDataset
orArray
orStrDataset
orHDMFDataset
orAbstractDataChunkIterator
orDataIO
) – the source of the data
- abstract property data¶
The target data that this region applies to
- abstract property region¶
The region that indexes into data e.g. slice or list of indices
- class hdmf.container.MultiContainerInterface(name)¶
Bases:
Container
Class that dynamically defines methods to support a Container holding multiple Containers of the same type.
To use, extend this class and create a dictionary as a class attribute with any of the following keys: * ‘attr’ to name the attribute that stores the Container instances * ‘type’ to provide the Container object type (type or list/tuple of types, type can be a docval macro) * ‘add’ to name the method for adding Container instances * ‘get’ to name the method for getting Container instances * ‘create’ to name the method for creating Container instances (only if a single type is specified)
If the attribute does not exist in the class, it will be generated. If it does exist, it should behave like a dict.
The keys ‘attr’, ‘type’, and ‘add’ are required.
- Parameters:
name (
str
) – the name of this container
- class hdmf.container.Row¶
Bases:
object
A class for representing rows from a Table.
The Table class can be indicated with the __table__. Doing so will set constructor arguments for the Row class and ensure that Row.idx is set appropriately when a Row is added to the Table. It will also add functionality to the Table class for getting Row objects.
Note, the Row class is not needed for working with Table objects. This is merely convenience functionality for working with Tables.
- property idx¶
The index of this row in its respective Table
- property table¶
The Table this Row comes from
- class hdmf.container.RowGetter(table)¶
Bases:
object
A simple class for providing __getitem__ functionality that returns Row objects to a Table.
- __getitem__(idx)¶
- class hdmf.container.Table(columns, name, data=[])¶
Bases:
Data
Subclasses should specify the class attribute __columns__.
This should be a list of dictionaries with the following keys:
name
the column nametype
the type of data in this columndoc
a brief description of what gets stored in this column
For reference, this list of dictionaries will be used with docval to autogenerate the
add_row
method for adding data to this table.If __columns__ is not specified, no custom
add_row
method will be added.The class attribute __defaultname__ can also be set to specify a default name for the table class. If __defaultname__ is not specified, then
name
will need to be specified when the class is instantiated.A Table class can be paired with a Row class for conveniently working with rows of a Table. This pairing must be indicated in the Row class implementation. See Row for more details.
- Parameters:
columns (
list
ortuple
) – a list of the columns in this tablename (
str
) – the name of this containerdata (
ndarray
orlist
ortuple
orDataset
orArray
orStrDataset
orHDMFDataset
orAbstractDataChunkIterator
orDataIO
) – the source of the data
- property columns¶
- which(**kwargs)¶
Query a table
- __getitem__(args)¶
- to_dataframe()¶
Produce a pandas DataFrame containing this table’s data.
- classmethod from_dataframe(df, name=None, extra_ok=False)¶
- Construct an instance of Table (or a subclass) from a pandas DataFrame. The columns of the dataframe
should match the columns defined on the Table subclass.
hdmf.build package¶
Submodules¶
hdmf.build.builders module¶
- class hdmf.build.builders.Builder(name, parent=None, source=None)¶
Bases:
dict
- Parameters:
- property path¶
The path of this builder.
- property name¶
The name of this builder.
- property source¶
The source of this builder.
- property parent¶
The parent builder of this builder.
- class hdmf.build.builders.BaseBuilder(name, attributes={}, parent=None, source=None)¶
Bases:
Builder
- Parameters:
name (
str
) – The name of the builder.attributes (
dict
) – A dictionary of attributes to create in this builder.parent (
GroupBuilder
) – The parent builder of this builder.source (
str
) – The source of the data represented in this builder
- property location¶
The location of this Builder in its source.
- property attributes¶
The attributes stored in this Builder object.
- class hdmf.build.builders.GroupBuilder(name, groups={}, datasets={}, attributes={}, links={}, parent=None, source=None)¶
Bases:
BaseBuilder
Create a builder object for a group.
- Parameters:
name (
str
) – The name of the group.groups (
dict
orlist
) – A dictionary or list of subgroups to add to this group. If a dict is provided, only the values are used.datasets (
dict
orlist
) – A dictionary or list of datasets to add to this group. If a dict is provided, only the values are used.attributes (
dict
) – A dictionary of attributes to create in this group.links (
dict
orlist
) – A dictionary or list of links to add to this group. If a dict is provided, only the values are used.parent (
GroupBuilder
) – The parent builder of this builder.source (
str
) – The source of the data represented in this builder.
- property source¶
The source of this Builder
- property groups¶
The subgroups contained in this group.
- property datasets¶
The datasets contained in this group.
- property links¶
The links contained in this group.
- set_attribute(name, value)¶
Set an attribute for this group.
- Parameters:
name (
str
) – The name of the attribute.value (None) – The attribute value.
- set_group(builder)¶
Add a subgroup to this group.
- Parameters:
builder (
GroupBuilder
) – The GroupBuilder to add to this group.
- set_dataset(builder)¶
Add a dataset to this group.
- Parameters:
builder (
DatasetBuilder
) – The DatasetBuilder to add to this group.
- set_link(builder)¶
Add a link to this group.
- Parameters:
builder (
LinkBuilder
) – The LinkBuilder to add to this group.
- is_empty()¶
Returns true if there are no datasets, links, attributes, and non-empty subgroups. False otherwise.
- __getitem__(key)¶
Like dict.__getitem__, but looks in groups, datasets, attributes, and links sub-dictionaries. Key can be a posix path to a sub-builder.
- get(key, default=None)¶
Like dict.get, but looks in groups, datasets, attributes, and links sub-dictionaries. Key can be a posix path to a sub-builder.
- items()¶
Like dict.items, but iterates over items in groups, datasets, attributes, and links sub-dictionaries.
- keys()¶
Like dict.keys, but iterates over keys in groups, datasets, attributes, and links sub-dictionaries.
- values()¶
Like dict.values, but iterates over values in groups, datasets, attributes, and links sub-dictionaries.
- class hdmf.build.builders.DatasetBuilder(name, data=None, dtype=None, attributes={}, maxshape=None, chunks=False, parent=None, source=None)¶
Bases:
BaseBuilder
Create a Builder object for a dataset
- Parameters:
name (
str
) – The name of the dataset.data (
ndarray
orlist
ortuple
orDataset
orArray
orStrDataset
orHDMFDataset
orAbstractDataChunkIterator
orstr
orint
orfloat
orbytes
orbool
orDataIO
orDatasetBuilder
orRegionBuilder
orIterable
ordatetime
ordate
) – The data in this dataset.dtype (
type
ordtype
orstr
orlist
) – The datatype of this dataset.attributes (
dict
) – A dictionary of attributes to create in this dataset.maxshape (
int
ortuple
) – The shape of this dataset. Use None for scalars.chunks (
bool
) – Whether or not to chunk this dataset.parent (
GroupBuilder
) – The parent builder of this builder.source (
str
) – The source of the data in this builder.
- OBJECT_REF_TYPE = 'object'¶
- REGION_REF_TYPE = 'region'¶
- property data¶
The data stored in the dataset represented by this builder.
- property chunks¶
Whether or not this dataset is chunked.
- property maxshape¶
The max shape of this dataset.
- property dtype¶
The data type of this dataset.
- class hdmf.build.builders.LinkBuilder(builder, name=None, parent=None, source=None)¶
Bases:
Builder
Create a builder object for a link.
- Parameters:
builder (
DatasetBuilder
orGroupBuilder
) – The target group or dataset of this link.name (
str
) – The name of the linkparent (
GroupBuilder
) – The parent builder of this buildersource (
str
) – The source of the data in this builder
- property builder¶
The target builder object.
- class hdmf.build.builders.ReferenceBuilder(builder)¶
Bases:
dict
Create a builder object for a reference.
- Parameters:
builder (
DatasetBuilder
orGroupBuilder
) – The group or dataset this reference applies to.
- property builder¶
The target builder object.
- class hdmf.build.builders.RegionBuilder(region, builder)¶
Bases:
ReferenceBuilder
Create a builder object for a region reference.
- Parameters:
region (
slice
ortuple
orlist
orRegionReference
) – The region, i.e. slice or indices, into the target dataset.builder (
DatasetBuilder
) – The dataset this region reference applies to.
- property region¶
The selected region of the target dataset.
hdmf.build.classgenerator module¶
- class hdmf.build.classgenerator.ClassGenerator¶
Bases:
object
- property custom_generators¶
- register_generator(generator)¶
Add a custom class generator to this ClassGenerator.
Generators added later are run first. Duplicates are moved to the top of the list.
- Parameters:
generator (
type
) – the CustomClassGenerator class to register
- generate_class(data_type, spec, parent_cls, attr_names, type_map, post_init_method=None)¶
- Get the container class from data type specification.
If no class has been associated with the
data_type
fromnamespace
, a class will be dynamically created and returned.
- Parameters:
- Returns:
the class for the given namespace and data_type
- Return type:
- class hdmf.build.classgenerator.CustomClassGenerator(*args, **kwargs)¶
Bases:
object
Subclass this class and register an instance to alter how classes are auto-generated.
- classmethod apply_generator_to_field(field_spec, bases, type_map)¶
Return True to signal that this generator should return on all fields not yet processed.
- classmethod process_field_spec(classdict, docval_args, parent_cls, attr_name, not_inherited_fields, type_map, spec)¶
Add __fields__ to the classdict and update the docval args for the field spec with the given attribute name. :param classdict: The dict to update with __fields__ (or a different parent_cls._fieldsname). :param docval_args: The list of docval arguments. :param parent_cls: The parent class. :param attr_name: The attribute name of the field spec for the container class to generate. :param not_inherited_fields: Dictionary of fields not inherited from the parent class. :param type_map: The type map to use. :param spec: The spec for the container class to generate.
- classmethod post_process(classdict, bases, docval_args, spec)¶
Convert classdict[‘__fields__’] to tuple and update docval args for a fixed name and default name. :param classdict: The class dictionary to convert with ‘__fields__’ key (or a different bases[0]._fieldsname) :param bases: The list of base classes. :param docval_args: The dict of docval arguments. :param spec: The spec for the container class to generate.
- classmethod set_init(classdict, bases, docval_args, not_inherited_fields, name)¶
- class hdmf.build.classgenerator.MCIClassGenerator(*args, **kwargs)¶
Bases:
CustomClassGenerator
- classmethod apply_generator_to_field(field_spec, bases, type_map)¶
Return True if the field spec has quantity * or +, False otherwise.
- classmethod process_field_spec(classdict, docval_args, parent_cls, attr_name, not_inherited_fields, type_map, spec)¶
Add __clsconf__ to the classdict and update the docval args for the field spec with the given attribute name. :param classdict: The dict to update with __clsconf__. :param docval_args: The list of docval arguments. :param parent_cls: The parent class. :param attr_name: The attribute name of the field spec for the container class to generate. :param not_inherited_fields: Dictionary of fields not inherited from the parent class. :param type_map: The type map to use. :param spec: The spec for the container class to generate.
- classmethod post_process(classdict, bases, docval_args, spec)¶
Add MultiContainerInterface to the list of base classes. :param classdict: The class dictionary. :param bases: The list of base classes. :param docval_args: The dict of docval arguments. :param spec: The spec for the container class to generate.
- classmethod set_init(classdict, bases, docval_args, not_inherited_fields, name)¶
hdmf.build.errors module¶
Module for build error definitions
- exception hdmf.build.errors.BuildError(builder, reason)¶
Bases:
Exception
Error raised when building a container into a builder.
- exception hdmf.build.errors.OrphanContainerBuildError(builder, container)¶
Bases:
BuildError
- Parameters:
builder (
Builder
) – the builder containing the broken linkcontainer (
AbstractContainer
) – the container that has no parent
- exception hdmf.build.errors.ReferenceTargetNotBuiltError(builder, container)¶
Bases:
BuildError
- Parameters:
builder (
Builder
) – the builder containing the reference that cannot be foundcontainer (
AbstractContainer
) – the container that is not built yet
hdmf.build.manager module¶
- class hdmf.build.manager.Proxy(manager, source, location, namespace, data_type)¶
Bases:
object
A temporary object to represent a Container. This gets used when resolving the true location of a Container’s parent. Proxy objects allow simple bookkeeping of all potential parents a Container may have. This object is used by providing all the necessary information for describing the object. This object gets passed around and candidates are accumulated. Upon calling resolve, all saved candidates are matched against the information (provided to the constructor). The candidate that has an exact match is returned.
- property source¶
The source of the object e.g. file source
- property location¶
The location of the object. This can be thought of as a unique path
- property namespace¶
The namespace from which the data_type of this Proxy came from
- property data_type¶
The data_type of Container that should match this Proxy
- matches(object)¶
- Parameters:
object (
BaseBuilder
orContainer
) – the container or builder to get a proxy for
- add_candidate(container)¶
- Parameters:
container (
Container
) – the Container to add as a candidate match
- resolve()¶
- class hdmf.build.manager.BuildManager(type_map)¶
Bases:
object
A class for managing builds of AbstractContainers
- property namespace_catalog¶
- property type_map¶
- get_proxy(object, source=None)¶
- Parameters:
object (
BaseBuilder
orAbstractContainer
) – the container or builder to get a proxy forsource (
str
) – the source of container being built i.e. file path
- build(container, source=None, spec_ext=None, export=False, root=False)¶
Build the GroupBuilder/DatasetBuilder for the given AbstractContainer
- Parameters:
container (
AbstractContainer
) – the container to convert to a Buildersource (
str
) – the source of container being built i.e. file pathspec_ext (
BaseStorageSpec
) – a spec that further refines the base specificationexport (
bool
) – whether this build is for exportingroot (
bool
) – whether the container is the root of the build process
- prebuilt(container, builder)¶
Save the Builder for a given AbstractContainer for future use
- Parameters:
container (
AbstractContainer
) – the AbstractContainer to save as prebuiltbuilder (
DatasetBuilder
orGroupBuilder
) – the Builder representation of the given container
- queue_ref(func)¶
Set aside creating ReferenceBuilders
- purge_outdated()¶
- clear_cache()¶
- get_builder(container)¶
Return the prebuilt builder for the given container or None if it does not exist.
- Parameters:
container (
AbstractContainer
) – the container to get the builder for
- construct(builder)¶
Construct the AbstractContainer represented by the given builder
- Parameters:
builder (
DatasetBuilder
orGroupBuilder
) – the builder to construct the AbstractContainer from
- get_cls(builder)¶
Get the class object for the given Builder
- Parameters:
builder (
Builder
) – the Builder to get the class object for
- get_builder_name(container)¶
Get the name a Builder should be given
- Parameters:
container (
AbstractContainer
) – the container to convert to a Builder- Returns:
The name a Builder should be given when building this container
- Return type:
- get_subspec(spec, builder)¶
Get the specification from this spec that corresponds to the given builder
- Parameters:
spec (
DatasetSpec
orGroupSpec
) – the parent spec to searchbuilder (
DatasetBuilder
orGroupBuilder
orLinkBuilder
) – the builder to get the sub-specification for
- get_builder_ns(builder)¶
Get the namespace of a builder
- Parameters:
builder (
DatasetBuilder
orGroupBuilder
orLinkBuilder
) – the builder to get the sub-specification for
- get_builder_dt(builder)¶
Get the data_type of a builder
- Parameters:
builder (
DatasetBuilder
orGroupBuilder
orLinkBuilder
) – the builder to get the data_type for
- is_sub_data_type(builder, parent_data_type)¶
Return whether or not data_type of builder is a sub-data_type of parent_data_type
- Parameters:
builder (
GroupBuilder
orDatasetBuilder
orAbstractContainer
) – the builder or container to checkparent_data_type (
str
) – the potential parent data_type that refers to a data_type
- Returns:
True if data_type of builder is a sub-data_type of parent_data_type, False otherwise
- Return type:
- class hdmf.build.manager.TypeSource(namespace, data_type)¶
Bases:
object
A class to indicate the source of a data_type in a namespace. This class should only be used by TypeMap
- Parameters:
- property namespace¶
- property data_type¶
- class hdmf.build.manager.TypeMap(namespaces=None, mapper_cls=None, type_config=None)¶
Bases:
object
A class to maintain the map between ObjectMappers and AbstractContainer classes
- Parameters:
namespaces (
NamespaceCatalog
) – the NamespaceCatalog to usemapper_cls (
type
) – the ObjectMapper class to usetype_config (
TypeConfigurator
) – The TypeConfigurator to use.
- property namespace_catalog¶
- property container_types¶
- copy_mappers(type_map)¶
- merge(type_map, ns_catalog=False)¶
- register_generator(generator)¶
Add a custom class generator.
- Parameters:
generator (
type
) – the CustomClassGenerator class to register
- load_namespaces(namespace_path, resolve=True, reader=None)¶
- Load namespaces from a namespace file.
This method will call load_namespaces on the NamespaceCatalog used to construct this TypeMap. Additionally, it will process the return value to keep track of what types were included in the loaded namespaces. Calling load_namespaces here has the advantage of being able to keep track of type dependencies across namespaces.
- Parameters:
namespace_path (
str
) – the path to the file containing the namespaces(s) to loadresolve (
bool
) – whether or not to include objects from included/parent spec objectsreader (
SpecReader
) – the class to user for reading specifications
- Returns:
the namespaces loaded from the given file
- Return type:
- get_container_cls(namespace, data_type, autogen=True)¶
- Get the container class from data type specification.
If no class has been associated with the
data_type
fromnamespace
, a class will be dynamically created and returned.
- get_dt_container_cls(data_type, namespace=None, post_init_method=None, autogen=True)¶
- Get the container class from data type specification.
If no class has been associated with the
data_type
fromnamespace
, a class will be dynamically created and returned.Replaces get_container_cls but namespace is optional. If namespace is unknown, it will be looked up from all namespaces.
- Parameters:
- Returns:
the class for the given namespace and data_type
- Return type:
- get_builder_dt(builder)¶
Get the data_type of a builder
- Parameters:
builder (
DatasetBuilder
orGroupBuilder
orLinkBuilder
) – the builder to get the data_type for
- get_builder_ns(builder)¶
Get the namespace of a builder
- Parameters:
builder (
DatasetBuilder
orGroupBuilder
orLinkBuilder
) – the builder to get the sub-specification for
- get_cls(builder)¶
Get the class object for the given Builder
- Parameters:
builder (
Builder
) – the Builder object to get the corresponding AbstractContainer class for
- get_subspec(spec, builder)¶
Get the specification from this spec that corresponds to the given builder
- Parameters:
spec (
DatasetSpec
orGroupSpec
) – the parent spec to searchbuilder (
DatasetBuilder
orGroupBuilder
orLinkBuilder
) – the builder to get the sub-specification for
- get_container_ns_dt(obj)¶
- get_container_cls_dt(cls)¶
- get_container_classes(namespace=None)¶
- Parameters:
namespace (
str
) – the namespace to get the container classes for
- get_map(obj)¶
Return the ObjectMapper object that should be used for the given container
- Parameters:
obj (
AbstractContainer
orBuilder
) – the object to get the ObjectMapper for- Returns:
the ObjectMapper to use for mapping the given object
- Return type:
ObjectMapper
- register_container_type(namespace, data_type, container_cls)¶
Map a container class to a data_type
- Parameters:
namespace (
str
) – the namespace containing the data_type to map the class todata_type (
str
) – the data_type to map the class tocontainer_cls (
TypeSource
ortype
) – the class to map to the specified data_type
- register_map(container_cls, mapper_cls)¶
Map a container class to an ObjectMapper class
- build(container, manager=None, source=None, builder=None, spec_ext=None, export=False)¶
Build the GroupBuilder/DatasetBuilder for the given AbstractContainer
- Parameters:
container (
AbstractContainer
) – the container to convert to a Buildermanager (
BuildManager
) – the BuildManager to use for managing this buildsource (
str
) – the source of container being built i.e. file pathbuilder (
BaseBuilder
) – the Builder to build onspec_ext (
BaseStorageSpec
) – a spec extensionexport (
bool
) – whether this build is for exporting
- construct(builder, build_manager=None, parent=None)¶
Construct the AbstractContainer represented by the given builder
- Parameters:
builder (
DatasetBuilder
orGroupBuilder
) – the builder to construct the AbstractContainer frombuild_manager (
BuildManager
) – the BuildManager for constructingparent (
Proxy
orContainer
) – the parent Container/Proxy for the Container being built
- get_builder_name(container)¶
Get the name a Builder should be given
- Parameters:
container (
AbstractContainer
) – the container to convert to a Builder- Returns:
The name a Builder should be given when building this container
- Return type:
hdmf.build.map module¶
hdmf.build.objectmapper module¶
- class hdmf.build.objectmapper.ObjectMapper(spec)¶
Bases:
object
A class for mapping between Spec objects and AbstractContainer attributes
Create a map from AbstractContainer attributes to specifications
- Parameters:
spec (
DatasetSpec
orGroupSpec
) – The specification for mapping objects to builders
- classmethod no_convert(obj_type)¶
Specify an object type that ObjectMappers should not convert.
- classmethod convert_dtype(spec, value, spec_dtype=None)¶
Convert values to the specified dtype. For example, if a literal int is passed in to a field that is specified as a unsigned integer, this function will convert the Python int to a numpy unsigned int.
- Parameters:
spec – The DatasetSpec or AttributeSpec to which this value is being applied
value – The value being converted to the spec dtype
spec_dtype – Optional override of the dtype in spec.dtype. Used to specify the parent dtype when the given extended spec lacks a dtype.
- Returns:
The function returns a tuple consisting of 1) the value, and 2) the data type. The value is returned as the function may convert the input value to comply with the dtype specified in the schema.
- static constructor_arg(name)¶
Decorator to override the default mapping scheme for a given constructor argument.
Decorate ObjectMapper methods with this function when extending ObjectMapper to override the default scheme for mapping between AbstractContainer and Builder objects. The decorated method should accept as its first argument the Builder object that is being mapped. The method should return the value to be passed to the target AbstractContainer class constructor argument given by name.
- Parameters:
name (
str
) – the name of the constructor argument
- static object_attr(name)¶
Decorator to override the default mapping scheme for a given object attribute.
Decorate ObjectMapper methods with this function when extending ObjectMapper to override the default scheme for mapping between AbstractContainer and Builder objects. The decorated method should accept as its first argument the AbstractContainer object that is being mapped. The method should return the child Builder object (or scalar if the object attribute corresponds to an AttributeSpec) that represents the attribute given by name.
- Parameters:
name (
str
) – the name of the constructor argument
- property spec¶
the Spec used in this ObjectMapper
- get_container_name(*args)¶
- classmethod convert_dt_name(spec)¶
Construct the attribute name corresponding to a specification
- Parameters:
spec (
Spec
) – the specification to get the name for
- classmethod get_attr_names(spec)¶
Get the attribute names for each subspecification in a Spec
- Parameters:
spec (
Spec
) – the specification to get the object attribute names for
- map_attr(attr_name, spec)¶
Map an attribute to spec. Use this to override default behavior
- get_attr_spec(attr_name)¶
Return the Spec for a given attribute
- Parameters:
attr_name (
str
) – the name of the attribute
- get_carg_spec(carg_name)¶
Return the Spec for a given constructor argument
- Parameters:
carg_name (
str
) – the name of the constructor argument
- map_const_arg(const_arg, spec)¶
Map an attribute to spec. Use this to override default behavior
- unmap(spec)¶
Removing any mapping for a specification. Use this to override default mapping
- Parameters:
spec (
Spec
) – the spec to map the attribute to
- map_spec(attr_carg, spec)¶
Map the given specification to the construct argument and object attribute
- get_attribute(spec)¶
Get the object attribute name for the given Spec
- get_attr_value(spec, container, manager)¶
Get the value of the attribute corresponding to this spec from the given container
- Parameters:
spec (
Spec
) – the spec to get the attribute value forcontainer (
AbstractContainer
) – the container to get the attribute value frommanager (
BuildManager
) – the BuildManager used for managing this build
- get_const_arg(spec)¶
Get the constructor argument for the given Spec
- build(container, manager, parent=None, source=None, builder=None, spec_ext=None, export=False)¶
Convert an AbstractContainer to a Builder representation.
References are not added but are queued to be added in the BuildManager.
- Parameters:
container (
AbstractContainer
) – the container to convert to a Buildermanager (
BuildManager
) – the BuildManager to use for managing this buildparent (
GroupBuilder
) – the parent of the resulting Buildersource (
str
) – the source of container being built i.e. file pathbuilder (
BaseBuilder
) – the Builder to build onspec_ext (
BaseStorageSpec
) – a spec extensionexport (
bool
) – whether this build is for exporting
- Returns:
the Builder representing the given AbstractContainer
- Return type:
- construct(builder, manager, parent=None)¶
Construct an AbstractContainer from the given Builder
- Parameters:
builder (
DatasetBuilder
orGroupBuilder
) – the builder to construct the AbstractContainer frommanager (
BuildManager
) – the BuildManager for this buildparent (
Proxy
orAbstractContainer
) – the parent AbstractContainer/Proxy for the AbstractContainer being built
- get_builder_name(container)¶
Get the name of a Builder that represents a AbstractContainer
- Parameters:
container (
AbstractContainer
) – the AbstractContainer to get the Builder name for
- constructor_args = {'name': <function ObjectMapper.get_container_name>}¶
- obj_attrs = {}¶
hdmf.build.warnings module¶
Module for build warnings
- exception hdmf.build.warnings.BuildWarning¶
Bases:
UserWarning
Base class for warnings that are raised during the building of a container.
- exception hdmf.build.warnings.IncorrectQuantityBuildWarning¶
Bases:
BuildWarning
Raised when a container field contains a number of groups/datasets/links that is not allowed by the spec.
- exception hdmf.build.warnings.MissingRequiredBuildWarning¶
Bases:
BuildWarning
Raised when a required field is missing.
- exception hdmf.build.warnings.MissingRequiredWarning¶
Bases:
MissingRequiredBuildWarning
Raised when a required field is missing.
- exception hdmf.build.warnings.OrphanContainerWarning¶
Bases:
BuildWarning
Raised when a container is built without a parent.
- exception hdmf.build.warnings.DtypeConversionWarning¶
Bases:
UserWarning
Raised when a value is converted to a different data type in order to match the specification.
Module contents¶
hdmf.spec package¶
Submodules¶
hdmf.spec.catalog module¶
- class hdmf.spec.catalog.SpecCatalog¶
Bases:
object
Create a new catalog for storing specifications
** Private Instance Variables **
- Variables:
__specs – Dict with the specification of each registered type
__parent_types – Dict with parent types for each registered type
__spec_source_files – Dict with the path to the source files (if available) for each registered type
__hierarchy – Dict describing the hierarchy for each registered type. NOTE: Always use SpecCatalog.get_hierarchy(…) to retrieve the hierarchy as this dictionary is used like a cache, i.e., to avoid repeated calculation of the hierarchy but the contents are computed on first request by SpecCatalog.get_hierarchy(…)
- register_spec(spec, source_file=None)¶
Associate a specified object type with a specification
- Parameters:
spec (
BaseStorageSpec
) – a Spec objectsource_file (
str
) – path to the source file from which the spec was loaded
- get_spec(data_type)¶
Get the Spec object for the given type
- Parameters:
data_type (
str
) – the data_type to get the Spec for- Returns:
the specification for writing the given object type to HDF5
- Return type:
Spec
- get_registered_types()¶
Return all registered specifications
- get_spec_source_file(data_type)¶
- Return the path to the source file from which the spec for the given
type was loaded from. None is returned if no file path is available for the spec. Note: The spec in the file may not be identical to the object in case the spec is modified after load.
- Parameters:
data_type (
str
) – the data_type of the spec to get the source file for- Returns:
the path to source specification file from which the spec was originally loaded or None
- Return type:
str
- auto_register(spec, source_file=None)¶
Register this specification and all sub-specification using data_type as object type name
- Parameters:
spec (
BaseStorageSpec
) – the Spec object to registersource_file (
str
) – path to the source file from which the spec was loaded
- Returns:
the types that were registered with this spec
- Return type:
- get_hierarchy(data_type)¶
For a given type get the type inheritance hierarchy for that type.
E.g., if we have a type MyContainer that inherits from BaseContainer then the result will be a tuple with the strings (‘MyContainer’, ‘BaseContainer’)
- get_full_hierarchy()¶
- Get the complete hierarchy of all types. The function attempts to sort types by name using
standard Python sorted.
- Returns:
Hierarchically nested OrderedDict with the hierarchy of all the types
- Return type:
- get_subtypes(data_type, recursive=True)¶
For a given data type recursively find all the subtypes that inherit from it.
E.g., assume we have the following inheritance hierarchy:
-BaseContainer--+-->AContainer--->ADContainer | +-->BContainer
In this case, the subtypes of BaseContainer would be (AContainer, ADContainer, BContainer), the subtypes of AContainer would be (ADContainer), and the subtypes of BContainer would be empty ().
hdmf.spec.namespace module¶
- class hdmf.spec.namespace.SpecNamespace(doc, name, schema, full_name=None, version=None, date=None, author=None, contact=None, catalog=None)¶
Bases:
dict
A namespace for specifications
- Parameters:
doc (
str
) – a description about what this namespace representsname (
str
) – the name of this namespaceschema (
list
) – location of schema specification files or other Namespacesfull_name (
str
) – extended full name of this namespaceversion (
str
ortuple
orlist
) – Version number of the namespacedate (
datetime
orstr
) – Date last modified or released. Formatting is %Y-%m-%d %H:%M:%S, e.g, 2017-04-25 17:14:13contact (
str
orlist
) – List of emails. Ordering should be the same as for authorcatalog (
SpecCatalog
) – The SpecCatalog object for this SpecNamespace
- UNVERSIONED = None¶
- classmethod types_key()¶
Get the key used for specifying types to include from a file or namespace
Override this method to use a different name for ‘data_types’
- property full_name¶
String with full name or None
- property contact¶
String or list of strings with the contacts or None
- property author¶
String or list of strings with the authors or None
- property version¶
String, list, or tuple with the version or SpecNamespace.UNVERSIONED if the version is missing or empty
- property date¶
Date last modified or released.
- Returns:
datetime object, string, or None
- property name¶
String with short name or None
- property doc¶
- property schema¶
- get_source_files()¶
Get the list of names of the source files included the schema of the namespace
- get_source_description(sourcefile)¶
- Get the description of a source file as described in the namespace. The result is a
dict which contains the ‘source’ and optionally ‘title’, ‘doc’ and ‘data_types’ imported from the source file
- property catalog¶
The SpecCatalog containing all the Specs
- get_spec(data_type)¶
Get the Spec object for the given data type
- get_registered_types()¶
Get the available types in this namespace
- Returns:
the a tuple of the available data types
- Return type:
- get_hierarchy(data_type)¶
Get the extension hierarchy for the given data_type in this namespace
- classmethod build_namespace(**spec_dict)¶
- class hdmf.spec.namespace.SpecReader(source)¶
Bases:
object
- Parameters:
source (
str
) – the source from which this reader reads from
- property source¶
- abstract read_spec()¶
- abstract read_namespace()¶
- class hdmf.spec.namespace.YAMLSpecReader(indir='.')¶
Bases:
SpecReader
- Parameters:
indir (
str
) – the path spec files are relative to
- read_namespace(namespace_path)¶
- read_spec(spec_path)¶
- class hdmf.spec.namespace.NamespaceCatalog(group_spec_cls=<class 'hdmf.spec.spec.GroupSpec'>, dataset_spec_cls=<class 'hdmf.spec.spec.DatasetSpec'>, spec_namespace_cls=<class 'hdmf.spec.namespace.SpecNamespace'>)¶
Bases:
object
Create a catalog for storing multiple Namespaces
- Parameters:
- merge(ns_catalog)¶
- property namespaces¶
The namespaces in this NamespaceCatalog
- Returns:
a tuple of the available namespaces
- Return type:
- property dataset_spec_cls¶
The DatasetSpec class used in this NamespaceCatalog
- property group_spec_cls¶
The GroupSpec class used in this NamespaceCatalog
- property spec_namespace_cls¶
The SpecNamespace class used in this NamespaceCatalog
- add_namespace(name, namespace)¶
Add a namespace to this catalog
- Parameters:
name (
str
) – the name of this namespacenamespace (
SpecNamespace
) – the SpecNamespace object
- get_namespace(name)¶
Get the a SpecNamespace
- Parameters:
name (
str
) – the name of this namespace- Returns:
the SpecNamespace with the given name
- Return type:
- get_spec(namespace, data_type)¶
Get the Spec object for the given type from the given Namespace
- get_hierarchy(namespace, data_type)¶
Get the type hierarchy for a given data_type in a given namespace
- is_sub_data_type(namespace, data_type, parent_data_type)¶
Return whether or not data_type is a sub data_type of parent_data_type
- get_sources()¶
Get all the source specification files that were loaded in this catalog
- get_namespace_sources(namespace)¶
Get all the source specifications that were loaded for a given namespace
- Parameters:
namespace (
str
) – the name of the namespace
- get_types(source)¶
Get the types that were loaded from a given source
- Parameters:
source (
str
) – the name of the source
- load_namespaces(namespace_path, resolve=True, reader=None)¶
Load the namespaces in the given file
- Parameters:
namespace_path (
str
) – the path to the file containing the namespaces(s) to loadresolve (
bool
) – whether or not to include objects from included/parent spec objectsreader (
SpecReader
) – the class to user for reading specifications
- Returns:
a dictionary describing the dependencies of loaded namespaces
- Return type:
hdmf.spec.spec module¶
- class hdmf.spec.spec.DtypeHelper¶
Bases:
object
- primary_dtype_synonyms = {'ascii': ['ascii', 'bytes'], 'bool': ['bool'], 'double': ['double', 'float64'], 'float': ['float', 'float32'], 'int': ['int32', 'int'], 'int8': ['int8'], 'isodatetime': ['isodatetime', 'datetime', 'date'], 'long': ['int64', 'long'], 'numeric': ['numeric'], 'object': ['object'], 'region': ['region'], 'short': ['int16', 'short'], 'uint16': ['uint16'], 'uint32': ['uint32', 'uint'], 'uint64': ['uint64'], 'uint8': ['uint8'], 'utf': ['text', 'utf', 'utf8', 'utf-8']}¶
- recommended_primary_dtypes = ['float', 'double', 'short', 'int', 'long', 'utf', 'ascii', 'bool', 'int8', 'uint8', 'uint16', 'uint32', 'uint64', 'object', 'region', 'numeric', 'isodatetime']¶
- valid_primary_dtypes = {'ascii', 'bool', 'bytes', 'date', 'datetime', 'double', 'float', 'float32', 'float64', 'int', 'int16', 'int32', 'int64', 'int8', 'isodatetime', 'long', 'numeric', 'object', 'region', 'short', 'text', 'uint', 'uint16', 'uint32', 'uint64', 'uint8', 'utf', 'utf-8', 'utf8'}¶
- static simplify_cpd_type(cpd_type)¶
Transform a list of DtypeSpecs into a list of strings. Use for simple representation of compound type and validation.
- Parameters:
cpd_type (list) – The list of DtypeSpecs to simplify
- static check_dtype(dtype)¶
Check that the dtype string is a reference or a valid primary dtype.
- class hdmf.spec.spec.ConstructableDict¶
Bases:
dict
- classmethod build_const_args(spec_dict)¶
Build constructor arguments for this ConstructableDict class from a dictionary
- classmethod build_spec(spec_dict)¶
Build a Spec object from the given Spec dict
- class hdmf.spec.spec.Spec(doc, name=None, required=True, parent=None)¶
Bases:
ConstructableDict
A base specification class
- Parameters:
- property doc¶
Documentation on what this Spec is specifying
- property name¶
The name of the object being specified
- property parent¶
The parent specification of this specification
- classmethod build_const_args(spec_dict)¶
Build constructor arguments for this Spec class from a dictionary
- property path¶
- class hdmf.spec.spec.RefSpec(target_type, reftype)¶
Bases:
ConstructableDict
- Parameters:
- property target_type¶
The data_type of the target of the reference
- property reftype¶
The type of reference
- class hdmf.spec.spec.AttributeSpec(name, doc, dtype, shape=None, dims=None, required=True, parent=None, value=None, default_value=None)¶
Bases:
Spec
Specification for attributes
- Parameters:
name (
str
) – The name of this attributedoc (
str
) – a description about what this specification representsrequired (
bool
) – whether or not this attribute is required. ignored when “value” is specifiedparent (
BaseStorageSpec
) – the parent of this specvalue (None) – a constant value for this attribute
default_value (None) – a default value for this attribute
- property dtype¶
The data type of the attribute
- property value¶
The constant value of the attribute. “None” if this attribute is not constant
- property default_value¶
The default value of the attribute. “None” if this attribute has no default value
- property required¶
True if this attribute is required, False otherwise.
- property dims¶
The dimensions of this attribute’s value
- property shape¶
The shape of this attribute’s value
- classmethod build_const_args(spec_dict)¶
Build constructor arguments for this Spec class from a dictionary
- class hdmf.spec.spec.BaseStorageSpec(doc, name=None, default_name=None, attributes=[], linkable=True, quantity=1, data_type_def=None, data_type_inc=None)¶
Bases:
Spec
A specification for any object that can hold attributes.
- Parameters:
doc (
str
) – a description about what this specification representsname (
str
) – the name of this base storage container, allowed only if quantity is not ‘+’ or ‘*’default_name (
str
) – The default name of this base storage container, used only if name is Noneattributes (
list
) – the attributes on this grouplinkable (
bool
) – whether or not this group can be linkedquantity (
str
orint
) – the required number of allowed instancedata_type_def (
str
) – the data type this specification representsdata_type_inc (
str
orBaseStorageSpec
) – the data type this specification extends
- property default_name¶
The default name for this spec
- property resolved¶
- property required¶
Whether or not the this spec represents a required field
- resolve_spec(inc_spec)¶
Add attributes from the inc_spec to this spec and track which attributes are new and overridden.
- Parameters:
inc_spec (
BaseStorageSpec
) – the data type this specification represents
- is_inherited_spec(spec)¶
Return True if this spec was inherited from the parent type, False otherwise.
Returns False if the spec is not found.
- is_overridden_spec(spec)¶
Return True if this spec overrides a specification from the parent type, False otherwise.
Returns False if the spec is not found.
- is_inherited_attribute(name)¶
Return True if the attribute was inherited from the parent type, False otherwise.
Raises a ValueError if the spec is not found.
- Parameters:
name (
str
) – the name of the attribute to check
- is_overridden_attribute(name)¶
Return True if the given attribute overrides the specification from the parent, False otherwise.
Raises a ValueError if the spec is not found.
- Parameters:
name (
str
) – the name of the attribute to check
- is_many()¶
- classmethod get_data_type_spec(data_type_def)¶
- classmethod get_namespace_spec()¶
- property attributes¶
Tuple of attribute specifications for this specification
- property linkable¶
True if object can be a link, False otherwise
- classmethod id_key()¶
Get the key used to store data ID on an instance
Override this method to use a different name for ‘object_id’
- classmethod type_key()¶
Get the key used to store data type on an instance
Override this method to use a different name for ‘data_type’. HDMF supports combining schema that uses ‘data_type’ and at most one different name for ‘data_type’.
- classmethod inc_key()¶
Get the key used to define a data_type include.
Override this method to use a different keyword for ‘data_type_inc’. HDMF supports combining schema that uses ‘data_type_inc’ and at most one different name for ‘data_type_inc’.
- classmethod def_key()¶
Get the key used to define a data_type definition.
Override this method to use a different keyword for ‘data_type_def’ HDMF supports combining schema that uses ‘data_type_def’ and at most one different name for ‘data_type_def’.
- property data_type_inc¶
The data type this specification inherits
- property data_type_def¶
The data type this specification defines
- property data_type¶
The data type of this specification
- property quantity¶
The number of times the object being specified should be present
- add_attribute(name, doc, dtype, shape=None, dims=None, required=True, parent=None, value=None, default_value=None)¶
Add an attribute to this specification
- Parameters:
name (
str
) – The name of this attributedoc (
str
) – a description about what this specification representsrequired (
bool
) – whether or not this attribute is required. ignored when “value” is specifiedparent (
BaseStorageSpec
) – the parent of this specvalue (None) – a constant value for this attribute
default_value (None) – a default value for this attribute
- set_attribute(spec)¶
Set an attribute on this specification
- Parameters:
spec (
AttributeSpec
) – the specification for the attribute to add
- get_attribute(name)¶
Get an attribute on this specification
- Parameters:
name (
str
) – the name of the attribute to the Spec for
- classmethod build_const_args(spec_dict)¶
Build constructor arguments for this Spec class from a dictionary
- class hdmf.spec.spec.DtypeSpec(name, doc, dtype)¶
Bases:
ConstructableDict
A class for specifying a component of a compound type
- Parameters:
- property doc¶
Documentation about this component
- property name¶
The name of this component
- property dtype¶
The data type of this component
- static assertValidDtype(dtype)¶
- static check_valid_dtype(dtype)¶
- classmethod build_const_args(spec_dict)¶
Build constructor arguments for this Spec class from a dictionary
- class hdmf.spec.spec.DatasetSpec(doc, dtype=None, name=None, default_name=None, shape=None, dims=None, attributes=[], linkable=True, quantity=1, default_value=None, data_type_def=None, data_type_inc=None)¶
Bases:
BaseStorageSpec
Specification for datasets
To specify a table-like dataset i.e. a compound data type.
- Parameters:
doc (
str
) – a description about what this specification representsdtype (
str
orlist
orRefSpec
) – The data type of this attribute. Use a list of DtypeSpecs to specify a compound data type.name (
str
) – The name of this datasetdefault_name (
str
) – The default name of this datasetattributes (
list
) – the attributes on this grouplinkable (
bool
) – whether or not this group can be linkedquantity (
str
orint
) – the required number of allowed instancedefault_value (None) – a default value for this dataset
data_type_def (
str
) – the data type this specification representsdata_type_inc (
str
orDatasetSpec
) – the data type this specification extends
- resolve_spec(inc_spec)¶
- Parameters:
inc_spec (
DatasetSpec
) – the data type this specification represents
- property dims¶
The dimensions of this Dataset
- property dtype¶
The data type of the Dataset
- property shape¶
The shape of the dataset
- property default_value¶
The default value of the dataset or None if not specified
- classmethod dtype_spec_cls()¶
The class to use when constructing DtypeSpec objects
Override this if extending to use a class other than DtypeSpec to build dataset specifications
- classmethod build_const_args(spec_dict)¶
Build constructor arguments for this Spec class from a dictionary
- class hdmf.spec.spec.LinkSpec(doc, target_type, quantity=1, name=None)¶
Bases:
Spec
- Parameters:
- property target_type¶
The data type of target specification
- property data_type_inc¶
The data type of target specification
- is_many()¶
- property quantity¶
The number of times the object being specified should be present
- property required¶
Whether or not the this spec represents a required field
- class hdmf.spec.spec.GroupSpec(doc, name=None, default_name=None, groups=[], datasets=[], attributes=[], links=[], linkable=True, quantity=1, data_type_def=None, data_type_inc=None)¶
Bases:
BaseStorageSpec
Specification for groups
- Parameters:
doc (
str
) – a description about what this specification representsname (
str
) – the name of the Group that is written to the file. If this argument is omitted, users will be required to enter aname
field when creating instances of this data type in the API. Another option is to specifydefault_name
, in which case this name will be used as the name of the Group if no other name is provided.default_name (
str
) – The default name of this groupgroups (
list
) – the subgroups in this groupdatasets (
list
) – the datasets in this groupattributes (
list
) – the attributes on this grouplinks (
list
) – the links in this grouplinkable (
bool
) – whether or not this group can be linkedquantity (
str
orint
) – the allowable number of instance of this group in a certain location. See table of options here. Note that if youspecifyname
,quantity
cannot be'*'
,'+'
, or an integer greater that 1, because you cannot have more than one group of the same name in the same parent group.data_type_def (
str
) – the data type this specification representsdata_type_inc (
str
orGroupSpec
) – the data type this specification data_type_inc
- resolve_spec(inc_spec)¶
- Parameters:
inc_spec (
GroupSpec
) – the data type this specification represents
- is_inherited_dataset(name)¶
Return true if a dataset with the given name was inherited
- Parameters:
name (
str
) – the name of the dataset
- is_overridden_dataset(name)¶
Return true if a dataset with the given name overrides a specification from the parent type
- Parameters:
name (
str
) – the name of the dataset
- is_inherited_group(name)¶
Return true if a group with the given name was inherited
- Parameters:
name (
str
) – the name of the group
- is_overridden_group(name)¶
Return true if a group with the given name overrides a specification from the parent type
- Parameters:
name (
str
) – the name of the group
- is_inherited_link(name)¶
Return true if a link with the given name was inherited
- Parameters:
name (
str
) – the name of the link
- is_overridden_link(name)¶
Return true if a link with the given name overrides a specification from the parent type
- Parameters:
name (
str
) – the name of the link
- is_inherited_spec(spec)¶
Returns ‘True’ if specification was inherited from a parent type
- is_overridden_spec(spec)¶
Returns ‘True’ if specification overrides a specification from the parent type
- is_inherited_type(spec)¶
Returns True if spec represents a data type that was inherited
- Parameters:
spec (
BaseStorageSpec
orstr
) – the specification to check
- is_overridden_type(spec)¶
Returns True if spec represents a data type that overrides a specification from a parent type
- Parameters:
spec (
BaseStorageSpec
orstr
) – the specification to check
- is_inherited_target_type(spec)¶
Returns True if spec represents a target type that was inherited
- is_overridden_target_type(spec)¶
Returns True if spec represents a target type that overrides a specification from a parent type
- get_data_type(data_type)¶
Get a specification by “data_type”
NOTE: If there is only one spec for a given data type, then it is returned. If there are multiple specs for a given data type and they are all named, then they are returned in a list. If there are multiple specs for a given data type and only one is unnamed, then the unnamed spec is returned. The other named specs can be returned using get_group or get_dataset.
NOTE: this method looks for an exact match of the data type and does not consider the type hierarchy.
- Parameters:
data_type (
str
) – the data_type to retrieve
- get_target_type(target_type)¶
Get a specification by “target_type”
NOTE: If there is only one spec for a given target type, then it is returned. If there are multiple specs for a given target type and they are all named, then they are returned in a list. If there are multiple specs for a given target type and only one is unnamed, then the unnamed spec is returned. The other named specs can be returned using get_link.
NOTE: this method looks for an exact match of the target type and does not consider the type hierarchy.
- Parameters:
target_type (
str
) – the target_type to retrieve
- property groups¶
The groups specified in this GroupSpec
- property datasets¶
The datasets specified in this GroupSpec
- property links¶
The links specified in this GroupSpec
- add_group(doc, name=None, default_name=None, groups=[], datasets=[], attributes=[], links=[], linkable=True, quantity=1, data_type_def=None, data_type_inc=None)¶
Add a new specification for a subgroup to this group specification
- Parameters:
doc (
str
) – a description about what this specification representsname (
str
) – the name of the Group that is written to the file. If this argument is omitted, users will be required to enter aname
field when creating instances of this data type in the API. Another option is to specifydefault_name
, in which case this name will be used as the name of the Group if no other name is provided.default_name (
str
) – The default name of this groupgroups (
list
) – the subgroups in this groupdatasets (
list
) – the datasets in this groupattributes (
list
) – the attributes on this grouplinks (
list
) – the links in this grouplinkable (
bool
) – whether or not this group can be linkedthe allowable number of instance of this group in a certain location. See table of options here. Note that if youspecify
name
,quantity
cannot be'*'
,'+'
, or an integer greater that 1, because you cannot have more than one group of the same name in the same parent group.data_type_def (
str
) – the data type this specification representsdata_type_inc (
str
orGroupSpec
) – the data type this specification data_type_inc
- set_group(spec)¶
Add the given specification for a subgroup to this group specification
- Parameters:
spec (
GroupSpec
) – the specification for the subgroup
- get_group(name)¶
Get a specification for a subgroup to this group specification
- Parameters:
name (
str
) – the name of the group to the Spec for
- add_dataset(doc, dtype=None, name=None, default_name=None, shape=None, dims=None, attributes=[], linkable=True, quantity=1, default_value=None, data_type_def=None, data_type_inc=None)¶
Add a new specification for a dataset to this group specification
- Parameters:
doc (
str
) – a description about what this specification representsdtype (
str
orlist
orRefSpec
) – The data type of this attribute. Use a list of DtypeSpecs to specify a compound data type.name (
str
) – The name of this datasetdefault_name (
str
) – The default name of this datasetattributes (
list
) – the attributes on this grouplinkable (
bool
) – whether or not this group can be linkedquantity (
str
orint
) – the required number of allowed instancedefault_value (None) – a default value for this dataset
data_type_def (
str
) – the data type this specification representsdata_type_inc (
str
orDatasetSpec
) – the data type this specification extends
- set_dataset(spec)¶
Add the given specification for a dataset to this group specification
- Parameters:
spec (
DatasetSpec
) – the specification for the dataset
- get_dataset(name)¶
Get a specification for a dataset to this group specification
- Parameters:
name (
str
) – the name of the dataset to the Spec for
- add_link(doc, target_type, quantity=1, name=None)¶
Add a new specification for a link to this group specification
- set_link(spec)¶
Add a given specification for a link to this group specification
- Parameters:
spec (
LinkSpec
) – the specification for the object to link to
- get_link(name)¶
Get a specification for a link to this group specification
- Parameters:
name (
str
) – the name of the link to the Spec for
- classmethod dataset_spec_cls()¶
The class to use when constructing DatasetSpec objects
Override this if extending to use a class other than DatasetSpec to build dataset specifications
- classmethod link_spec_cls()¶
The class to use when constructing LinkSpec objects
Override this if extending to use a class other than LinkSpec to build link specifications
- classmethod build_const_args(spec_dict)¶
Build constructor arguments for this Spec class from a dictionary
hdmf.spec.write module¶
- class hdmf.spec.write.SpecWriter¶
Bases:
object
- abstract write_spec(spec_file_dict, path)¶
- abstract write_namespace(namespace, path)¶
- class hdmf.spec.write.YAMLSpecWriter(outdir='.')¶
Bases:
SpecWriter
- Parameters:
outdir (
str
) – the path to write the directory to output the namespace and specs too
- write_spec(spec_file_dict, path)¶
- write_namespace(namespace, path)¶
Write the given namespace key-value pairs as YAML to the given path.
- Parameters:
namespace – SpecNamespace holding the key-value pairs that define the namespace
path – File path to write the namespace to as YAML under the key ‘namespaces’
- reorder_yaml(path)¶
Open a YAML file, load it as python data, sort the data alphabetically, and write it back out to the same path.
- sort_keys(obj)¶
- class hdmf.spec.write.NamespaceBuilder(doc, name, full_name=None, version=None, author=None, contact=None, date=None, namespace_cls=<class 'hdmf.spec.namespace.SpecNamespace'>)¶
Bases:
object
A class for building namespace and spec files
- Parameters:
doc (
str
) – Description about what the namespace representsname (
str
) – Name of the namespacefull_name (
str
) – Extended full name of the namespaceversion (
str
ortuple
orlist
) – Version number of the namespacecontact (
str
orlist
) – List of emails. Ordering should be the same as for authordate (
datetime
orstr
) – Date last modified or released. Formatting is %Y-%m-%d %H:%M:%S, e.g, 2017-04-25 17:14:13namespace_cls (
type
) – the SpecNamespace type
- add_spec(source, spec)¶
Add a Spec to the namespace
- Parameters:
source (
str
) – the path to write the spec tospec (
GroupSpec
orDatasetSpec
) – the Spec to add
- add_source(source, doc=None, title=None)¶
Add a source file to the namespace
- include_type(data_type, source=None, namespace=None)¶
Include a data type from an existing namespace or source
- include_namespace(namespace)¶
Include an entire namespace
- Parameters:
namespace (
str
) – the namespace to include
- export(path, outdir='.', writer=None)¶
Export the namespace to the given path.
All new specification source files will be written in the same directory as the given path.
- Parameters:
path (
str
) – the path to write the spec tooutdir (
str
) – the path to write the directory to output the namespace and specs toowriter (
SpecWriter
) – the SpecWriter to use to write the namespace
- property name¶
- class hdmf.spec.write.SpecFileBuilder¶
Bases:
dict
- add_spec(spec)¶
- Parameters:
spec (
GroupSpec
orDatasetSpec
) – the Spec to add
- hdmf.spec.write.export_spec(ns_builder, new_data_types, output_dir)¶
Create YAML specification files for a new namespace and extensions with the given data type specs.
- Parameters:
ns_builder – NamespaceBuilder instance used to build the namespace and extension
new_data_types – Iterable of specs that represent new data types to be added
Module contents¶
hdmf.backends package¶
Subpackages¶
hdmf.backends.hdf5 package¶
Submodules¶
hdmf.backends.hdf5.h5_utils module¶
Utilities for the HDF5 I/O backend, e.g., for wrapping HDF5 datasets on read, wrapping arrays for configuring write, or writing the spec among others
- class hdmf.backends.hdf5.h5_utils.HDF5IODataChunkIteratorQueue¶
Bases:
deque
Helper class used by HDF5IO to manage the write for DataChunkIterators
Each queue element must be a tuple of two elements: 1) the dataset to write to and 2) the AbstractDataChunkIterator with the data
- exhaust_queue()¶
Read and write from any queued DataChunkIterators in a round-robin fashion
- append(dataset, data)¶
Append a value to the queue
- Parameters:
dataset (
Dataset
) – The dataset where the DataChunkIterator is written todata (AbstractDataChunkIterator) – DataChunkIterator with the data to be written
- class hdmf.backends.hdf5.h5_utils.H5Dataset(dataset, io)¶
Bases:
HDMFDataset
- Parameters:
- property io¶
- property regionref¶
- property ref¶
- property shape¶
- class hdmf.backends.hdf5.h5_utils.DatasetOfReferences(dataset, io)¶
Bases:
H5Dataset
,ReferenceResolver
An extension of the base ReferenceResolver class to add more abstract methods for subclasses that will read HDF5 references
- Parameters:
- abstract get_object(h5obj)¶
A class that maps an HDF5 object to a Builder or Container
- invert()¶
Return an object that defers reference resolution but in the opposite direction.
- class hdmf.backends.hdf5.h5_utils.BuilderResolverMixin¶
Bases:
BuilderResolver
A mixin for adding to HDF5 reference-resolving types the get_object method that returns Builders
- get_object(h5obj)¶
A class that maps an HDF5 object to a Builder
- class hdmf.backends.hdf5.h5_utils.ContainerResolverMixin¶
Bases:
ContainerResolver
A mixin for adding to HDF5 reference-resolving types the get_object method that returns Containers
- get_object(h5obj)¶
A class that maps an HDF5 object to a Container
- class hdmf.backends.hdf5.h5_utils.AbstractH5TableDataset(dataset, io, types)¶
Bases:
DatasetOfReferences
- Parameters:
- property types¶
- property dtype¶
- __getitem__(arg)¶
- resolve(manager)¶
- class hdmf.backends.hdf5.h5_utils.AbstractH5ReferenceDataset(dataset, io)¶
Bases:
DatasetOfReferences
- Parameters:
- __getitem__(arg)¶
- property dtype¶
- class hdmf.backends.hdf5.h5_utils.AbstractH5RegionDataset(dataset, io)¶
Bases:
AbstractH5ReferenceDataset
- Parameters:
- __getitem__(arg)¶
- property dtype¶
- class hdmf.backends.hdf5.h5_utils.ContainerH5TableDataset(dataset, io, types)¶
Bases:
ContainerResolverMixin
,AbstractH5TableDataset
A reference-resolving dataset for resolving references inside tables (i.e. compound dtypes) that returns resolved references as Containers
- Parameters:
- classmethod get_inverse_class()¶
Return the class the represents the ReferenceResolver that resolves references to the opposite type.
BuilderResolver.get_inverse_class should return a class that subclasses ContainerResolver.
ContainerResolver.get_inverse_class should return a class that subclasses BuilderResolver.
- class hdmf.backends.hdf5.h5_utils.BuilderH5TableDataset(dataset, io, types)¶
Bases:
BuilderResolverMixin
,AbstractH5TableDataset
A reference-resolving dataset for resolving references inside tables (i.e. compound dtypes) that returns resolved references as Builders
- Parameters:
- classmethod get_inverse_class()¶
Return the class the represents the ReferenceResolver that resolves references to the opposite type.
BuilderResolver.get_inverse_class should return a class that subclasses ContainerResolver.
ContainerResolver.get_inverse_class should return a class that subclasses BuilderResolver.
- class hdmf.backends.hdf5.h5_utils.ContainerH5ReferenceDataset(dataset, io)¶
Bases:
ContainerResolverMixin
,AbstractH5ReferenceDataset
A reference-resolving dataset for resolving object references that returns resolved references as Containers
- Parameters:
- classmethod get_inverse_class()¶
Return the class the represents the ReferenceResolver that resolves references to the opposite type.
BuilderResolver.get_inverse_class should return a class that subclasses ContainerResolver.
ContainerResolver.get_inverse_class should return a class that subclasses BuilderResolver.
- class hdmf.backends.hdf5.h5_utils.BuilderH5ReferenceDataset(dataset, io)¶
Bases:
BuilderResolverMixin
,AbstractH5ReferenceDataset
A reference-resolving dataset for resolving object references that returns resolved references as Builders
- Parameters:
- classmethod get_inverse_class()¶
Return the class the represents the ReferenceResolver that resolves references to the opposite type.
BuilderResolver.get_inverse_class should return a class that subclasses ContainerResolver.
ContainerResolver.get_inverse_class should return a class that subclasses BuilderResolver.
- class hdmf.backends.hdf5.h5_utils.ContainerH5RegionDataset(dataset, io)¶
Bases:
ContainerResolverMixin
,AbstractH5RegionDataset
A reference-resolving dataset for resolving region references that returns resolved references as Containers
- Parameters:
- classmethod get_inverse_class()¶
Return the class the represents the ReferenceResolver that resolves references to the opposite type.
BuilderResolver.get_inverse_class should return a class that subclasses ContainerResolver.
ContainerResolver.get_inverse_class should return a class that subclasses BuilderResolver.
- class hdmf.backends.hdf5.h5_utils.BuilderH5RegionDataset(dataset, io)¶
Bases:
BuilderResolverMixin
,AbstractH5RegionDataset
A reference-resolving dataset for resolving region references that returns resolved references as Builders
- Parameters:
- classmethod get_inverse_class()¶
Return the class the represents the ReferenceResolver that resolves references to the opposite type.
BuilderResolver.get_inverse_class should return a class that subclasses ContainerResolver.
ContainerResolver.get_inverse_class should return a class that subclasses BuilderResolver.
- class hdmf.backends.hdf5.h5_utils.H5SpecWriter(group)¶
Bases:
SpecWriter
- Parameters:
group (
Group
) – the HDF5 file to write specs to
- static stringify(spec)¶
Converts a spec into a JSON string to write to a dataset
- write_spec(spec, path)¶
- write_namespace(namespace, path)¶
- class hdmf.backends.hdf5.h5_utils.H5SpecReader(group)¶
Bases:
SpecReader
Class that reads cached JSON-formatted namespace and spec data from an HDF5 group.
- Parameters:
group (
Group
) – the HDF5 group to read specs from
- read_spec(spec_path)¶
- read_namespace(ns_path)¶
- class hdmf.backends.hdf5.h5_utils.H5RegionSlicer(dataset, region)¶
Bases:
RegionSlicer
- Parameters:
- __getitem__(idx)¶
Must be implemented by subclasses
- class hdmf.backends.hdf5.h5_utils.H5DataIO(data=None, maxshape=None, chunks=None, compression=None, compression_opts=None, fillvalue=None, shuffle=None, fletcher32=None, link_data=False, allow_plugin_filters=False, shape=None, dtype=None)¶
Bases:
DataIO
Wrap data arrays for write via HDF5IO to customize I/O behavior, such as compression and chunking for data arrays.
- Parameters:
data (
ndarray
orlist
ortuple
orDataset
orIterable
) – the data to be written. NOTE: If an h5py.Dataset is used, all other settings but link_data will be ignored as the dataset will either be linked to or copied as is in H5DataIO.maxshape (
tuple
) – Dataset will be resizable up to this shape (Tuple). Automatically enables chunking.Use None for the axes you want to be unlimited.chunks (
bool
ortuple
) – Chunk shape or True to enable auto-chunkingcompression (
str
orbool
orint
) – Compression strategy. If a bool is given, then gzip compression will be used by default.http://docs.h5py.org/en/latest/high/dataset.html#dataset-compressioncompression_opts (
int
ortuple
) – Parameter for compression filterfillvalue (None) – Value to be returned when reading uninitialized parts of the dataset
shuffle (
bool
) – Enable shuffle I/O filter. http://docs.h5py.org/en/latest/high/dataset.html#dataset-shufflefletcher32 (
bool
) – Enable fletcher32 checksum. http://docs.h5py.org/en/latest/high/dataset.html#dataset-fletcher32link_data (
bool
) – If data is an h5py.Dataset should it be linked to or copied. NOTE: This parameter is only allowed if data is an h5py.Datasetallow_plugin_filters (
bool
) – Enable passing dynamically loaded filters as compression parametershape (
tuple
) – the shape of the new dataset, used only if data is Nonedtype (
str
ortype
ordtype
) – the data type of the new dataset, used only if data is None
- property dataset¶
- get_io_params()¶
Returns a dict with the I/O parameters specified in this DataIO.
- static filter_available(filter, allow_plugin_filters)¶
Check if a given I/O filter is available
- property link_data¶
- property io_settings¶
- property valid¶
bool indicating if the data object is valid
hdmf.backends.hdf5.h5tools module¶
- class hdmf.backends.hdf5.h5tools.HDF5IO(path=None, mode='r', manager=None, comm=None, file=None, driver=None, aws_region=None, herd_path=None)¶
Bases:
HDMFIO
Open an HDF5 file for IO.
- Parameters:
mode (
str
) – the mode to open the HDF5 file with, one of (“w”, “r”, “r+”, “a”, “w-”, “x”). See h5py.File for more details.manager (
TypeMap
orBuildManager
) – the BuildManager or a TypeMap to construct a BuildManager to use for I/Ocomm (
Intracomm
) – the MPI communicator to use for parallel I/Ofile (
File
orS3File
orRemFile
) – a pre-existing h5py.File, S3File, or RemFile objectdriver (
str
) – driver for h5py to use when opening HDF5 fileaws_region (
str
) – If driver is ros3, then specify the aws region of the url.herd_path (
str
) – The path to read/write the HERD file
- static can_read(path)¶
Determines whether a given path is readable by the HDF5IO class
- property comm¶
The MPI communicator to use for parallel I/O.
- property driver¶
- property aws_region¶
- classmethod load_namespaces(namespace_catalog, path=None, namespaces=None, file=None, driver=None, aws_region=None)¶
Load cached namespaces from a file.
If file is not supplied, then an
h5py.File
object will be opened for the given path, the namespaces will be read, and the File object will be closed. If file is supplied, then the given File object will be read from and not closed.- raises ValueError:
if both path and file are supplied but path is not the same as the path of file.
- Parameters:
namespace_catalog (
NamespaceCatalog
orTypeMap
) – the NamespaceCatalog or TypeMap to load namespaces intonamespaces (
list
) – the namespaces to loadfile (
File
) – a pre-existing h5py.File objectdriver (
str
) – driver for h5py to use when opening HDF5 fileaws_region (
str
) – If driver is ros3, then specify the aws region of the url.
- Returns:
dict mapping the names of the loaded namespaces to a dict mapping included namespace names and the included data types
- Return type:
- classmethod get_namespaces(path=None, file=None, driver=None, aws_region=None)¶
Get the names and versions of the cached namespaces from a file.
If
file
is not supplied, then anh5py.File
object will be opened for the givenpath
, the namespaces will be read, and the File object will be closed. If file is supplied, then the given File object will be read from and not closed.If there are multiple versions of a namespace cached in the file, then only the latest one (using alphanumeric ordering) is returned. This is the version of the namespace that is loaded by HDF5IO.load_namespaces(…).
- raises ValueError:
if both path and file are supplied but path is not the same as the path of file.
- Parameters:
- Returns:
dict mapping names to versions of the namespaces in the file
- Return type:
- classmethod copy_file(source_filename, dest_filename, expand_external=True, expand_refs=False, expand_soft=False)¶
Convenience function to copy an HDF5 file while allowing external links to be resolved.
Warning
As of HDMF 2.0, this method is no longer supported and may be removed in a future version. Please use the export method or h5py.File.copy method instead.
Note
The source file will be opened in ‘r’ mode and the destination file will be opened in ‘w’ mode using h5py. To avoid possible collisions, care should be taken that, e.g., the source file is not opened already when calling this function.
- Parameters:
source_filename (
str
) – the path to the HDF5 file to copydest_filename (
str
) – the name of the destination fileexpand_external (
bool
) – expand external links into new objectsexpand_refs (
bool
) – copy objects which are pointed to by referenceexpand_soft (
bool
) – expand soft links into new objects
- write(container, cache_spec=True, link_data=True, exhaust_dci=True, herd=None)¶
Write the container to an HDF5 file.
- Parameters:
container (
Container
) – the Container object to writecache_spec (
bool
) – If True (default), cache specification to file (highly recommended). If False, do not cache specification to file. The appropriate specification will then need to be loaded prior to reading the file.link_data (
bool
) – If True (default), create external links to HDF5 Datasets. If False, copy HDF5 Datasets.exhaust_dci (
bool
) – If True (default), exhaust DataChunkIterators one at a time. If False, exhaust them concurrently.herd (
HERD
) – A HERD object to populate with references.
- export(src_io, container=None, write_args=None, cache_spec=True)¶
Export data read from a file from any backend to HDF5.
See
hdmf.backends.io.HDMFIO.export
for more details.- Parameters:
src_io (
HDMFIO
) – the HDMFIO object for reading the data to exportcontainer (
Container
) – the Container object to export. If None, then the entire contents of the HDMFIO object will be exportedwrite_args (
dict
) – arguments to pass towrite_builder
cache_spec (
bool
) – whether to cache the specification to file
- classmethod export_io(path, src_io, comm=None, container=None, write_args=None, cache_spec=True)¶
Export from one backend to HDF5 (class method).
- Parameters:
path (
str
) – the path to the destination HDF5 filesrc_io (
HDMFIO
) – the HDMFIO object for reading the data to exportcomm (
Intracomm
) – the MPI communicator to use for parallel I/Ocontainer (
Container
) – the Container object to export. If None, then the entire contents of the HDMFIO object will be exportedwrite_args (
dict
) – arguments to pass towrite_builder
cache_spec (
bool
) – whether to cache the specification to file
- read()¶
Read a container from the IO source.
- Returns:
the Container object that was read in
- Return type:
- read_builder()¶
Read data and return the GroupBuilder representing it.
NOTE: On read, the Builder.source may will usually not be set of the Builders. NOTE: The Builder.location is used internally to ensure correct handling of links (in particular on export) and should be set on read for all GroupBuilder, DatasetBuilder, and LinkBuilder objects.
- Returns:
a GroupBuilder representing the data object
- Return type:
- get_written(builder)¶
Return True if this builder has been written to (or read from) disk by this IO object, False otherwise.
- Parameters:
builder (Builder) – Builder object to get the written flag for
- Returns:
True if the builder is found in self._written_builders using the builder ID, False otherwise
- get_builder(h5obj)¶
Get the builder for the corresponding h5py Group or Dataset
- raises ValueError:
When no builder has been constructed yet for the given h5py object
- get_container(h5obj)¶
Get the container for the corresponding h5py Group or Dataset
- raises ValueError:
When no builder has been constructed yet for the given h5py object
- open()¶
Open this HDMFIO object for writing of the builder
- close(close_links=True)¶
Close this file and any files linked to from this file.
- Parameters:
close_links (bool) – Whether to close all files linked to from this file. (default: True)
- close_linked_files()¶
Close all opened, linked-to files.
MacOS and Linux automatically release the linked-to file after the linking file is closed, but Windows does not, which prevents the linked-to file from being deleted or truncated. Use this method to close all opened, linked-to files.
- write_builder(builder, link_data=True, exhaust_dci=True, export_source=None)¶
- Parameters:
builder (
GroupBuilder
) – the GroupBuilder object representing the HDF5 filelink_data (
bool
) – If not specified otherwise link (True) or copy (False) HDF5 Datasetsexhaust_dci (
bool
) – exhaust DataChunkIterators one at a time. If False, exhaust them concurrentlyexport_source (
str
) – The source of the builders when exporting
- classmethod get_type(data)¶
- set_attributes(obj, attributes)¶
- write_group(parent, builder, link_data=True, exhaust_dci=True, export_source=None)¶
- Parameters:
parent (
Group
) – the parent HDF5 objectbuilder (
GroupBuilder
) – the GroupBuilder to writelink_data (
bool
) – If not specified otherwise link (True) or copy (False) HDF5 Datasetsexhaust_dci (
bool
) – exhaust DataChunkIterators one at a time. If False, exhaust them concurrentlyexport_source (
str
) – The source of the builders when exporting
- Returns:
the Group that was created
- Return type:
- write_link(parent, builder, export_source=None)¶
- Parameters:
parent (
Group
) – the parent HDF5 objectbuilder (
LinkBuilder
) – the LinkBuilder to writeexport_source (
str
) – The source of the builders when exporting
- Returns:
the Link that was created
- Return type:
- write_dataset(parent, builder, link_data=True, exhaust_dci=True, export_source=None)¶
Write a dataset to HDF5
The function uses other dataset-dependent write functions, e.g,
__scalar_fill__
,__list_fill__
, and__setup_chunked_dset__
to write the data.- Parameters:
parent (
Group
) – the parent HDF5 objectbuilder (
DatasetBuilder
) – the DatasetBuilder to writelink_data (
bool
) – If not specified otherwise link (True) or copy (False) HDF5 Datasetsexhaust_dci (
bool
) – exhaust DataChunkIterators one at a time. If False, exhaust them concurrentlyexport_source (
str
) – The source of the builders when exporting
- Returns:
the Dataset that was created
- Return type:
- property mode¶
Return the HDF5 file mode. One of (“w”, “r”, “r+”, “a”, “w-”, “x”).
- classmethod set_dataio(data=None, maxshape=None, chunks=None, compression=None, compression_opts=None, fillvalue=None, shuffle=None, fletcher32=None, link_data=False, allow_plugin_filters=False, shape=None, dtype=None)¶
Wrap the given Data object with an H5DataIO.
This method is provided merely for convenience. It is the equivalent of the following:
from hdmf.backends.hdf5 import H5DataIO data = ... data = H5DataIO(data)
- Parameters:
data (
ndarray
orlist
ortuple
orDataset
orIterable
) – the data to be written. NOTE: If an h5py.Dataset is used, all other settings but link_data will be ignored as the dataset will either be linked to or copied as is in H5DataIO.maxshape (
tuple
) – Dataset will be resizable up to this shape (Tuple). Automatically enables chunking.Use None for the axes you want to be unlimited.chunks (
bool
ortuple
) – Chunk shape or True to enable auto-chunkingcompression (
str
orbool
orint
) – Compression strategy. If a bool is given, then gzip compression will be used by default.http://docs.h5py.org/en/latest/high/dataset.html#dataset-compressioncompression_opts (
int
ortuple
) – Parameter for compression filterfillvalue (None) – Value to be returned when reading uninitialized parts of the dataset
shuffle (
bool
) – Enable shuffle I/O filter. http://docs.h5py.org/en/latest/high/dataset.html#dataset-shufflefletcher32 (
bool
) – Enable fletcher32 checksum. http://docs.h5py.org/en/latest/high/dataset.html#dataset-fletcher32link_data (
bool
) – If data is an h5py.Dataset should it be linked to or copied. NOTE: This parameter is only allowed if data is an h5py.Datasetallow_plugin_filters (
bool
) – Enable passing dynamically loaded filters as compression parametershape (
tuple
) – the shape of the new dataset, used only if data is Nonedtype (
str
ortype
ordtype
) – the data type of the new dataset, used only if data is None
Module contents¶
Submodules¶
hdmf.backends.errors module¶
Module for I/O backend errors
- exception hdmf.backends.errors.UnsupportedOperation¶
Bases:
ValueError
hdmf.backends.io module¶
- class hdmf.backends.io.HDMFIO(manager=None, source=None, herd_path=None)¶
Bases:
object
- Parameters:
manager (
BuildManager
) – the BuildManager to use for I/Osource (
str
orPath
) – the source of container being built i.e. file pathherd_path (
str
) – The path to read/write the HERD file
- abstract static can_read(path)¶
Determines whether a given path is readable by this HDMFIO class
- property manager¶
The BuildManager this instance is using
- property source¶
The source of the container being read/written i.e. file path
- read()¶
Read a container from the IO source.
- Returns:
the Container object that was read in
- Return type:
- write(container, herd=None)¶
- export(src_io, container=None, write_args={}, clear_cache=False)¶
Export from one backend to the backend represented by this class.
If container is provided, then the build manager of src_io is used to build the container, and the resulting builder will be exported to the new backend. So if container is provided, src_io must have a non-None manager property. If container is None, then the contents of src_io will be read and exported to the new backend.
The provided container must be the root of the hierarchy of the source used to read the container (i.e., you cannot read a file and export a part of that file.
Arguments can be passed in for the write_builder method using write_args. Some arguments may not be supported during export.
Example usage:
old_io = HDF5IO('old.nwb', 'r') with HDF5IO('new_copy.nwb', 'w') as new_io: new_io.export(old_io)
- NOTE: When implementing export support on custom backends. Export does not update the Builder.source
on the Builders. As such, when writing LinkBuilders we need to determine if LinkBuilder.source and LinkBuilder.builder.source are the same, and if so the link should be internal to the current file (even if the Builder.source points to a different location).
- Parameters:
src_io (
HDMFIO
) – the HDMFIO object for reading the data to exportcontainer (
Container
) – the Container object to export. If None, then the entire contents of the HDMFIO object will be exportedwrite_args (
dict
) – arguments to pass towrite_builder
clear_cache (
bool
) – whether to clear the build manager cache
- abstract read_builder()¶
Read data and return the GroupBuilder representing it
- Returns:
a GroupBuilder representing the read data
- Return type:
GroupBuilder
- abstract write_builder(builder)¶
Write a GroupBuilder representing an Container object
- Parameters:
builder (
GroupBuilder
) – the GroupBuilder object representing the Container
- abstract open()¶
Open this HDMFIO object for writing of the builder
- abstract close()¶
Close this HDMFIO object to further reading/writing
hdmf.backends.utils module¶
Module with utility functions and classes used for implementation of I/O backends
- class hdmf.backends.utils.WriteStatusTracker¶
Bases:
dict
Helper class used for tracking the write status of builders. I.e., to track whether a builder has been written or not.
- set_written(builder)¶
Mark this builder as written.
- Parameters:
builder (Builder) – Builder object to be marked as written
- get_written(builder)¶
Return True if this builder has been written to (or read from) disk by this IO object, False otherwise.
- Parameters:
builder (Builder) – Builder object to get the written flag for
- Returns:
True if the builder is found in self._written_builders using the builder ID, False otherwise
- class hdmf.backends.utils.NamespaceToBuilderHelper¶
Bases:
object
Helper class used in HDF5IO (and possibly elsewhere) to convert a namespace to a builder for I/O
- classmethod convert_namespace(ns_catalog, namespace)¶
Convert a namespace to a builder
- Parameters:
ns_catalog (
NamespaceCatalog
) – the namespace catalog with the specsnamespace (
str
) – the name of the namespace to be converted to a builder
hdmf.backends.warnings module¶
- exception hdmf.backends.warnings.BrokenLinkWarning¶
Bases:
UserWarning
Raised when a group has a key with a None value.
Module contents¶
hdmf.data_utils module¶
- hdmf.data_utils.append_data(data, arg)¶
- hdmf.data_utils.extend_data(data, arg)¶
Add all the elements of the iterable arg to the end of data.
- Parameters:
data (list, DataIO, numpy.ndarray, h5py.Dataset) – The array to extend
- class hdmf.data_utils.AbstractDataChunkIterator¶
Bases:
object
Abstract iterator class used to iterate over DataChunks.
Derived classes must ensure that all abstract methods and abstract properties are implemented, in particular, dtype, maxshape, __iter__, ___next__, recommended_chunk_shape, and recommended_data_shape.
Iterating over AbstractContainer objects is not yet supported.
- abstract __iter__()¶
Return the iterator object
- abstract __next__()¶
Return the next data chunk or raise a StopIteration exception if all chunks have been retrieved.
HINT: numpy.s_ provides a convenient way to generate index tuples using standard array slicing. This is often useful to define the DataChunk.selection of the current chunk
- Returns:
DataChunk object with the data and selection of the current chunk
- Return type:
- abstract recommended_chunk_shape()¶
Recommend the chunk shape for the data array.
- Returns:
NumPy-style shape tuple describing the recommended shape for the chunks of the target array or None. This may or may not be the same as the shape of the chunks returned in the iteration process.
- abstract recommended_data_shape()¶
Recommend the initial shape for the data array.
This is useful in particular to avoid repeated resized of the target array when reading from this data iterator. This should typically be either the final size of the array or the known minimal shape of the array.
- Returns:
NumPy-style shape tuple indicating the recommended initial shape for the target array. This may or may not be the final full shape of the array, i.e., the array is allowed to grow. This should not be None.
- abstract property dtype¶
Define the data type of the array
- Returns:
NumPy style dtype or otherwise compliant dtype string
- abstract property maxshape¶
Property describing the maximum shape of the data array that is being iterated over
- Returns:
NumPy-style shape tuple indicating the maximum dimensions up to which the dataset may be resized. Axes with None are unlimited.
- class hdmf.data_utils.GenericDataChunkIterator(buffer_gb=None, buffer_shape=None, chunk_mb=None, chunk_shape=None, display_progress=False, progress_bar_class=None, progress_bar_options=None)¶
Bases:
AbstractDataChunkIterator
DataChunkIterator that lets the user specify chunk and buffer shapes.
Break a dataset into buffers containing multiple chunks to be written into an HDF5 dataset.
Basic users should set the buffer_gb argument to as much free RAM space as can be safely allocated. Advanced users are offered full control over the shape parameters for the buffer and the chunks; however, the chunk shape must perfectly divide the buffer shape along each axis.
HDF5 recommends chunk size in the range of 2 to 16 MB for optimal cloud performance. https://youtu.be/rcS5vt-mKok?t=621
- Parameters:
buffer_gb (
float
orint
) – If buffer_shape is not specified, it will be inferred as the smallest chunk below the buffer_gb threshold.Defaults to 1GB.buffer_shape (
tuple
) – Manually defined shape of the buffer.chunk_mb (
float
orint
) – (‘If chunk_shape is not specified, it will be inferred as the smallest chunk below the chunk_mb threshold.’, ‘Defaults to 10MB.’)chunk_shape (
tuple
) – Manually defined shape of the chunks.display_progress (
bool
) – Display a progress bar with iteration rate and estimated completion time.progress_bar_class (
Callable
) – The progress bar class to use. Defaults to tqdm.tqdm if the TQDM package is installed.progress_bar_options (
dict
) – Dictionary of keyword arguments to be passed directly to tqdm.
- abstract _get_data(selection: Tuple[slice]) ndarray ¶
Retrieve the data specified by the selection using minimal I/O.
The developer of a new implementation of the GenericDataChunkIterator must ensure the data is actually loaded into memory, and not simply mapped.
- Parameters:
selection (Tuple[slice]) – tuple of slices, each indicating the selection indexed with respect to maxshape for that axis. Each axis of tuple is a slice of the full shape from which to pull data into the buffer.
- Returns:
Array of data specified by selection
- Return type:
- abstract _get_maxshape() Tuple[int, ...] ¶
Retrieve the maximum bounds of the data shape using minimal I/O.
- recommended_chunk_shape() Tuple[int, ...] ¶
Recommend the chunk shape for the data array.
- Returns:
NumPy-style shape tuple describing the recommended shape for the chunks of the target array or None. This may or may not be the same as the shape of the chunks returned in the iteration process.
- recommended_data_shape() Tuple[int, ...] ¶
Recommend the initial shape for the data array.
This is useful in particular to avoid repeated resized of the target array when reading from this data iterator. This should typically be either the final size of the array or the known minimal shape of the array.
- Returns:
NumPy-style shape tuple indicating the recommended initial shape for the target array. This may or may not be the final full shape of the array, i.e., the array is allowed to grow. This should not be None.
- class hdmf.data_utils.DataChunkIterator(data=None, maxshape=None, dtype=None, buffer_size=1, iter_axis=0)¶
Bases:
AbstractDataChunkIterator
Custom iterator class used to iterate over chunks of data.
This default implementation of AbstractDataChunkIterator accepts any iterable and assumes that we iterate over a single dimension of the data array (default: the first dimension). DataChunkIterator supports buffered read, i.e., multiple values from the input iterator can be combined to a single chunk. This is useful for buffered I/O operations, e.g., to improve performance by accumulating data in memory and writing larger blocks at once.
Note
DataChunkIterator assumes that the iterator that it wraps returns one element along the iteration dimension at a time. I.e., the iterator is expected to return chunks that are one dimension lower than the array itself. For example, when iterating over the first dimension of a dataset with shape (1000, 10, 10), then the iterator would return 1000 chunks of shape (10, 10) one-chunk-at-a-time. If this pattern does not match your use-case then using
GenericDataChunkIterator
orAbstractDataChunkIterator
may be more appropriate.- Initialize the DataChunkIterator.
If ‘data’ is an iterator and ‘dtype’ is not specified, then next is called on the iterator in order to determine the dtype of the data.
- Parameters:
data (None) – The data object used for iteration
maxshape (
tuple
) – The maximum shape of the full data array. Use None to indicate unlimited dimensionsdtype (
dtype
) – The Numpy data type for the arraybuffer_size (
int
) – Number of values to be buffered in a chunkiter_axis (
int
) – The dimension to iterate over
- classmethod from_iterable(data=None, maxshape=None, dtype=None, buffer_size=1, iter_axis=0)¶
- Parameters:
data (None) – The data object used for iteration
maxshape (
tuple
) – The maximum shape of the full data array. Use None to indicate unlimited dimensionsdtype (
dtype
) – The Numpy data type for the arraybuffer_size (
int
) – Number of values to be buffered in a chunkiter_axis (
int
) – The dimension to iterate over
- next()¶
Return the next data chunk or raise a StopIteration exception if all chunks have been retrieved.
Tip
numpy.s_
provides a convenient way to generate index tuples using standard array slicing. This is often useful to define the DataChunk.selection of the current chunk- Returns:
DataChunk object with the data and selection of the current chunk
- Return type:
- recommended_chunk_shape()¶
Recommend a chunk shape.
To optimize iterative write the chunk should be aligned with the common shape of chunks returned by __next__ or if those chunks are too large, then a well-aligned subset of those chunks. This may also be any other value in case one wants to recommend chunk shapes to optimize read rather than write. The default implementation returns None, indicating no preferential chunking option.
- recommended_data_shape()¶
- Recommend an initial shape of the data. This is useful when progressively writing data and
we want to recommend an initial size for the dataset
- property maxshape¶
Get a shape tuple describing the maximum shape of the array described by this DataChunkIterator.
Note
If an iterator is provided and no data has been read yet, then the first chunk will be read (i.e., next will be called on the iterator) in order to determine the maxshape. The iterator is expected to return single chunks along the iterator dimension, this means that maxshape will add an additional dimension along the iteration dimension. E.g., if we iterate over the first dimension and the iterator returns chunks of shape (10, 10), then the maxshape would be (None, 10, 10) or (len(self.data), 10, 10), depending on whether size of the iteration dimension is known.
- Returns:
Shape tuple. None is used for dimensions where the maximum shape is not known or unlimited.
- property dtype¶
Get the value data type
- Returns:
np.dtype object describing the datatype
- class hdmf.data_utils.DataChunk(data=None, selection=None)¶
Bases:
object
Class used to describe a data chunk. Used in DataChunkIterator.
- Parameters:
data (
ndarray
) – Numpy array with the data value(s) of the chunkselection (None) – Numpy index tuple describing the location of the chunk
- astype(dtype)¶
Get a new DataChunk with the self.data converted to the given type
- property dtype¶
Data type of the values in the chunk
- Returns:
np.dtype of the values in the DataChunk
- hdmf.data_utils.assertEqualShape(data1, data2, axes1=None, axes2=None, name1=None, name2=None, ignore_undetermined=True)¶
Ensure that the shape of data1 and data2 match along the given dimensions
- Parameters:
data1 (List, Tuple, numpy.ndarray, DataChunkIterator) – The first input array
data2 (List, Tuple, numpy.ndarray, DataChunkIterator) – The second input array
name1 – Optional string with the name of data1
name2 – Optional string with the name of data2
axes1 (int, Tuple(int), List(int), None) – The dimensions of data1 that should be matched to the dimensions of data2. Set to None to compare all axes in order.
axes2 – The dimensions of data2 that should be matched to the dimensions of data1. Must have the same length as axes1. Set to None to compare all axes in order.
ignore_undetermined – Boolean indicating whether non-matching unlimited dimensions should be ignored, i.e., if two dimension don’t match because we can’t determine the shape of either one, then should we ignore that case or treat it as no match
- Returns:
Bool indicating whether the check passed and a string with a message about the matching process
- class hdmf.data_utils.ShapeValidatorResult(result=False, message=None, ignored=(), unmatched=(), error=None, shape1=(), shape2=(), axes1=(), axes2=())¶
Bases:
object
Class for storing results from validating the shape of multi-dimensional arrays.
This class is used to store results generated by ShapeValidator
- Variables:
result – Boolean indicating whether results matched or not
message – Message indicating the result of the matching procedure
- Parameters:
result (
bool
) – Result of the shape validationmessage (
str
) – Message describing the result of the shape validationignored (
tuple
) – Axes that have been ignored in the validaton processunmatched (
tuple
) – List of axes that did not match during shape validationerror (
str
) – Error that may have occurred. One of ERROR_TYPEshape1 (
tuple
) – Shape of the first array for comparisonshape2 (
tuple
) – Shape of the second array for comparisonaxes1 (
tuple
) – Axes for the first array that should matchaxes2 (
tuple
) – Axes for the second array that should match
- SHAPE_ERROR = {'AXIS_LEN_ERROR': 'Unequal length of axes.', 'AXIS_OUT_OF_BOUNDS': 'Axis index for comparison out of bounds.', 'NUM_AXES_ERROR': 'Unequal number of axes for comparison.', 'NUM_DIMS_ERROR': 'Unequal number of dimensions.', None: 'All required axes matched'}¶
Dict where the Keys are the type of errors that may have occurred during shape comparison and the values are strings with default error messages for the type.
- class hdmf.data_utils.DataIO(data=None, dtype=None, shape=None)¶
Bases:
object
Base class for wrapping data arrays for I/O. Derived classes of DataIO are typically used to pass dataset-specific I/O parameters to the particular HDMFIO backend.
- Parameters:
data (
ndarray
orlist
ortuple
orDataset
orArray
orStrDataset
orHDMFDataset
orAbstractDataChunkIterator
) – the data to be writtendtype (
type
ordtype
) – the data type of the dataset. Not used if data is specified.shape (
tuple
) – the shape of the dataset. Not used if data is specified.
- get_io_params()¶
Returns a dict with the I/O parameters specified in this DataIO.
- property data¶
Get the wrapped data object
- property dtype¶
Get the wrapped data object
- property shape¶
Get the wrapped data object
- append(arg)¶
- extend(arg)¶
- __getitem__(item)¶
Delegate slicing to the data object
- property valid¶
bool indicating if the data object is valid
hdmf.utils module¶
- class hdmf.utils.AllowPositional(value)¶
Bases:
Enum
An enumeration.
- ALLOWED = 1¶
- WARNING = 2¶
- ERROR = 3¶
- hdmf.utils.docval_macro(macro)¶
Class decorator to add the class to a list of types associated with the key macro in the __macros dict
- hdmf.utils.get_docval_macro(key=None)¶
Return a deepcopy of the docval macros, i.e., strings that represent a customizable list of types for use in docval.
- Parameters:
key – Name of the macro. If key=None, then a dictionary of all macros is returned. Otherwise, a tuple of the types associated with the key is returned.
- hdmf.utils.check_type(value, argtype, allow_none=False)¶
Check a value against a type
The difference between this function and
isinstance
is that it allows specifying a type as a string. Furthermore, strings allow for specifying more general types, such as a simple numeric type (i.e.argtype="num"
).
- hdmf.utils.get_docval(func, *args)¶
Get a copy of docval arguments for a function. If args are supplied, return only docval arguments with value for ‘name’ key equal to the args
- hdmf.utils.fmt_docval_args(func, kwargs)¶
Separate positional and keyword arguments
Useful for methods that wrap other methods
- hdmf.utils.call_docval_func(func, kwargs)¶
Call the function with only the keyword arguments that are accepted by the function’s docval.
Extra keyword arguments are not passed to the function unless the function’s docval has allow_extra=True.
- hdmf.utils.docval(*validator, **options)¶
A decorator for documenting and enforcing type for instance method arguments.
This decorator takes a list of dictionaries that specify the method parameters. These dictionaries are used for enforcing type and building a Sphinx docstring.
The first arguments are dictionaries that specify the positional arguments and keyword arguments of the decorated function. These dictionaries must contain the following keys:
'name'
,'type'
, and'doc'
. This will define a positional argument. To define a keyword argument, specify a default value using the key'default'
. To validate the dimensions of an input array add the optional'shape'
parameter. To allow a None value for an argument, either the default value must be None or a different default value must be provided and'allow_none': True
must be passed.The decorated method must take
self
and**kwargs
as arguments.When using this decorator, the functions
getargs
andpopargs
can be used for easily extracting arguments from kwargs.The following code example demonstrates the use of this decorator:
@docval({'name': 'arg1':, 'type': str, 'doc': 'this is the first positional argument'}, {'name': 'arg2':, 'type': int, 'doc': 'this is the second positional argument'}, {'name': 'kwarg1':, 'type': (list, tuple), 'doc': 'this is a keyword argument', 'default': list()}, returns='foo object', rtype='Foo')) def foo(self, **kwargs): arg1, arg2, kwarg1 = getargs('arg1', 'arg2', 'kwarg1', **kwargs) ...
- Parameters:
enforce_type – Enforce types of input parameters (Default=True)
returns – String describing the return values
rtype – String describing the data type of the return values
is_method – True if this is decorating an instance or class method, False otherwise (Default=True)
enforce_shape – Enforce the dimensions of input arrays (Default=True)
validator –
dict
objects specifying the method parametersallow_extra – Allow extra arguments (Default=False)
allow_positional – Allow positional arguments (Default=True)
options – additional options for documenting and validating method parameters
- hdmf.utils.getargs(*argnames, argdict)¶
Convenience function to retrieve arguments from a dictionary in batch.
The last argument should be a dictionary, and the other arguments should be the keys (argument names) for which to retrieve the values.
- Raises:
ValueError – if a argument name is not found in the dictionary or there is only one argument passed to this function or the last argument is not a dictionary
- Returns:
a single value if there is only one argument, or a list of values corresponding to the given argument names
- hdmf.utils.popargs(*argnames, argdict)¶
Convenience function to retrieve and remove arguments from a dictionary in batch.
The last argument should be a dictionary, and the other arguments should be the keys (argument names) for which to retrieve the values.
- Raises:
ValueError – if a argument name is not found in the dictionary or there is only one argument passed to this function or the last argument is not a dictionary
- Returns:
a single value if there is only one argument, or a list of values corresponding to the given argument names
- hdmf.utils.popargs_to_dict(keys, argdict)¶
Convenience function to retrieve and remove arguments from a dictionary in batch into a dictionary.
Same as {key: argdict.pop(key) for key in keys} with a custom ValueError
- Parameters:
keys (Iterable) – Iterable of keys to pull out of argdict
argdict – Dictionary to process
- Raises:
ValueError – if an argument name is not found in the dictionary
- Returns:
a dict of arguments removed
- class hdmf.utils.ExtenderMeta(name, bases, namespace, **kwargs)¶
Bases:
ABCMeta
A metaclass that will extend the base class initialization routine by executing additional functions defined in classes that use this metaclass
In general, this class should only be used by core developers.
- classmethod pre_init(func)¶
A decorator that sets a ‘__preinit’ attribute on the target function and then returns the function as a classmethod.
- hdmf.utils.get_data_shape(data, strict_no_data_load=False)¶
Helper function used to determine the shape of the given array.
In order to determine the shape of nested tuples, lists, and sets, this function recursively inspects elements along the dimensions, assuming that the data has a regular, rectangular shape. In the case of out-of-core iterators, this means that the first item along each dimension would potentially be loaded into memory. Set strict_no_data_load=True to enforce that this does not happen, at the cost that we may not be able to determine the shape of the array.
- Parameters:
data (List, numpy.ndarray, DataChunkIterator) – Array for which we should determine the shape. Can be any object that supports __len__ or .shape.
strict_no_data_load – If True and data is an out-of-core iterator, None may be returned. If False (default), the first element of data may be loaded into memory.
- Returns:
Tuple of ints indicating the size of known dimensions. Dimensions for which the size is unknown will be set to None.
- hdmf.utils.pystr(s)¶
Convert a string of characters to Python str object
- hdmf.utils.to_uint_array(arr)¶
Convert a numpy array or array-like object to a numpy array of unsigned integers with the same dtype itemsize.
For example, a list of int32 values is converted to a numpy array with dtype uint32. :raises ValueError: if input array contains values that are not unsigned integers or non-negative integers.
- hdmf.utils.is_ragged(data)¶
Test whether a list of lists or array is ragged / jagged
- class hdmf.utils.LabelledDict(label, key_attr='name', add_callable=None, remove_callable=None)¶
Bases:
dict
A dict wrapper that allows querying by an attribute of the values and running a callable on removed items.
For example, if the key attribute is set as ‘name’ in __init__, then all objects added to the LabelledDict must have a ‘name’ attribute and a particular object in the LabelledDict can be accessed using the syntax [‘object_name’] if the object.name == ‘object_name’. In this way, LabelledDict acts like a set where values can be retrieved using square brackets around the value of the key attribute. An ‘add’ method makes clear the association between the key attribute of the LabelledDict and the values of the LabelledDict.
LabelledDict also supports retrieval of values with the syntax my_dict[‘attr == val’], which returns a set of objects in the LabelledDict which have an attribute ‘attr’ with a string value ‘val’. If no objects match that condition, a KeyError is raised. Note that if ‘attr’ equals the key attribute, then the single matching value is returned, not a set.
LabelledDict does not support changing items that have already been set. A TypeError will be raised when using __setitem__ on keys that already exist in the dict. The setdefault and update methods are not supported. A TypeError will be raised when these are called.
A callable function may be passed to the constructor to be run on an item after adding it to this dict using the __setitem__ and add methods.
A callable function may be passed to the constructor to be run on an item after removing it from this dict using the __delitem__ (the del operator), pop, and popitem methods. It will also be run on each removed item when using the clear method.
- Usage:
LabelledDict(label=’my_objects’, key_attr=’name’) my_dict[obj.name] = obj my_dict.add(obj) # simpler syntax
Example
# MyTestClass is a class with attributes ‘prop1’ and ‘prop2’. MyTestClass.__init__ sets those attributes. ld = LabelledDict(label=’all_objects’, key_attr=’prop1’) obj1 = MyTestClass(‘a’, ‘b’) obj2 = MyTestClass(‘d’, ‘b’) ld[obj1.prop1] = obj1 # obj1 is added to the LabelledDict with the key obj1.prop1. Any other key is not allowed. ld.add(obj2) # Simpler ‘add’ syntax enforces the required relationship ld[‘a’] # Returns obj1 ld[‘prop1 == a’] # Also returns obj1 ld[‘prop2 == b’] # Returns set([obj1, obj2]) - the set of all values v in ld where v.prop2 == ‘b’
- Parameters:
label (
str
) – the label on this dictionarykey_attr (
str
) – the attribute name to use as the keyadd_callable (
function
) – function to call on an element after adding it to this dict using the add or __setitem__ methodsremove_callable (
function
) – function to call on an element after removing it from this dict using the pop, popitem, clear, or __delitem__ methods
- property label¶
Return the label of this LabelledDict
- property key_attr¶
Return the attribute used as the key for values in this LabelledDict
- __getitem__(args)¶
Get a value from the LabelledDict with the given key.
Supports syntax my_dict[‘attr == val’], which returns a set of objects in the LabelledDict which have an attribute ‘attr’ with a string value ‘val’. If no objects match that condition, an empty set is returned. Note that if ‘attr’ equals the key attribute of this LabelledDict, then the single matching value is returned, not a set.
- add(value)¶
Add a value to the dict with the key value.key_attr.
Raises ValueError if value does not have attribute key_attr.
- pop(k)¶
Remove an item that matches the key. If remove_callable was initialized, call that on the returned value.
- popitem()¶
Remove the last added item. If remove_callable was initialized, call that on the returned value.
Note: popitem returns a tuple (key, value) but the remove_callable will be called only on the value.
Note: in Python 3.5 and earlier, dictionaries are not ordered, so popitem removes an arbitrary item.
- clear()¶
Remove all items. If remove_callable was initialized, call that on each returned value.
The order of removal depends on the popitem method.
- setdefault(k)¶
setdefault is not supported. A TypeError will be raised.
- update(other)¶
update is not supported. A TypeError will be raised.
- class hdmf.utils.StrDataset(dset, encoding, errors='strict')¶
Bases:
Dataset
Wrapper to decode strings on reading the dataset
Create a new Dataset object by binding to a low-level DatasetID.
- __getitem__(args)¶
Read a slice from the HDF5 dataset.
Takes slices and recarray-style field names (more than one is allowed!) in any order. Obeys basic NumPy rules, including broadcasting.
Also supports:
Boolean “mask” array indexing
hdmf.validate package¶
Submodules¶
hdmf.validate.errors module¶
- class hdmf.validate.errors.Error(name, reason, location=None)¶
Bases:
object
- Parameters:
- property name¶
- property reason¶
- property location¶
- class hdmf.validate.errors.ExpectedArrayError(name, expected, received, location=None)¶
Bases:
Error
- class hdmf.validate.errors.MissingDataType(name, data_type, location=None, missing_dt_name=None)¶
Bases:
Error
- Parameters:
- property data_type¶
- class hdmf.validate.errors.IllegalLinkError(name, location=None)¶
Bases:
Error
A validation error for indicating that a link was used where an actual object (i.e. a dataset or a group) must be used
- class hdmf.validate.errors.IncorrectDataType(name, expected, received, location=None)¶
Bases:
Error
A validation error for indicating that the incorrect data_type (not dtype) was used.
hdmf.validate.validator module¶
- hdmf.validate.validator.check_type(expected, received, string_format=None)¶
expected should come from the spec received should come from the data
- hdmf.validate.validator.get_iso8601_regex()¶
- hdmf.validate.validator.get_string_format(data)¶
Return the string format of the given data. Possible outputs are “isodatetime” and None.
- hdmf.validate.validator.get_type(data, builder_dtype=None)¶
Return a tuple of (the string representation of the type, the format of the string data) for the given data.
- hdmf.validate.validator.check_shape(expected, received)¶
- class hdmf.validate.validator.ValidatorMap(namespace)¶
Bases:
object
A class for keeping track of Validator objects for all data types in a namespace
- Parameters:
namespace (
SpecNamespace
) – the namespace to builder map for
- property namespace¶
- valid_types(spec)¶
Get all valid types for a given data type
- get_validator(data_type)¶
Return the validator for a given data type
- Parameters:
data_type (
BaseStorageSpec
orstr
) – the data type to get the validator for
- validate(builder)¶
Validate a builder against a Spec
builder
must have the attribute used to specifying data type by the namespace used to construct this ValidatorMap.- Parameters:
builder (
BaseBuilder
) – the builder to validate- Returns:
a list of errors found
- Return type:
- class hdmf.validate.validator.Validator(spec, validator_map)¶
Bases:
object
A base class for classes that will be used to validate against Spec subclasses
- Parameters:
spec (
Spec
) – the specification to use to validatevalidator_map (
ValidatorMap
) – the ValidatorMap to use during validation
- property spec¶
- property vmap¶
- abstract validate(value)¶
- Parameters:
value (None) – either in the form of a value or a Builder
- Returns:
a list of Errors
- Return type:
- classmethod get_spec_loc(spec)¶
- classmethod get_builder_loc(builder)¶
- class hdmf.validate.validator.AttributeValidator(spec, validator_map)¶
Bases:
Validator
A class for validating values against AttributeSpecs
- Parameters:
spec (
AttributeSpec
) – the specification to use to validatevalidator_map (
ValidatorMap
) – the ValidatorMap to use during validation
- class hdmf.validate.validator.BaseStorageValidator(spec, validator_map)¶
Bases:
Validator
A base class for validating against Spec objects that have attributes i.e. BaseStorageSpec
- Parameters:
spec (
BaseStorageSpec
) – the specification to use to validatevalidator_map (
ValidatorMap
) – the ValidatorMap to use during validation
- validate(builder)¶
- Parameters:
builder (
BaseBuilder
) – the builder to validate- Returns:
a list of Errors
- Return type:
- class hdmf.validate.validator.DatasetValidator(spec, validator_map)¶
Bases:
BaseStorageValidator
A class for validating DatasetBuilders against DatasetSpecs
- Parameters:
spec (
DatasetSpec
) – the specification to use to validatevalidator_map (
ValidatorMap
) – the ValidatorMap to use during validation
- validate(builder)¶
- Parameters:
builder (
DatasetBuilder
) – the builder to validate- Returns:
a list of Errors
- Return type:
- class hdmf.validate.validator.GroupValidator(spec, validator_map)¶
Bases:
BaseStorageValidator
A class for validating GroupBuilders against GroupSpecs
- Parameters:
spec (
GroupSpec
) – the specification to use to validatevalidator_map (
ValidatorMap
) – the ValidatorMap to use during validation
- validate(builder)¶
- Parameters:
builder (
GroupBuilder
) – the builder to validate- Returns:
a list of Errors
- Return type:
- class hdmf.validate.validator.SpecMatches(spec)¶
Bases:
object
A utility class to hold a spec and the builders matched to it
- add(builder)¶
- class hdmf.validate.validator.SpecMatcher(vmap, specs)¶
Bases:
object
Matches a set of builders against a set of specs
This class is intended to isolate the task of choosing which spec a builder should be validated against from the task of performing that validation.
- property unmatched_builders¶
Returns the builders for which no matching spec was found
These builders can be considered superfluous, and will generate a warning in the future.
- property spec_matches¶
Returns a list of tuples of (spec, assigned builders)
- assign_to_specs(builders)¶
Assigns a set of builders against a set of specs (many-to-one)
In the case that no matching spec is found, a builder will be added to a list of unmatched builders.
Module contents¶
hdmf.testing package¶
Submodules¶
hdmf.testing.testcase module¶
- class hdmf.testing.testcase.TestCase(methodName='runTest')¶
Bases:
TestCase
Extension of unittest’s TestCase to add useful functions for unit testing in HDMF.
Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.
- assertRaisesWith(exc_type, exc_msg, *args, **kwargs)¶
Asserts the given invocation raises the expected exception. This is similar to unittest’s assertRaises and assertRaisesRegex, but checks for an exact match.
- assertWarnsWith(warn_type, exc_msg, *args, **kwargs)¶
Asserts the given invocation raises the expected warning. This is similar to unittest’s assertWarns and assertWarnsRegex, but checks for an exact match.
- assertContainerEqual(container1, container2, ignore_name=False, ignore_hdmf_attrs=False, ignore_string_to_byte=False, message=None)¶
Asserts that the two AbstractContainers have equal contents. This applies to both Container and Data types.
- Parameters:
container1 (AbstractContainer) – First container
container2 (AbstractContainer) – Second container to compare with container 1
ignore_name – whether to ignore testing equality of name of the top-level container
ignore_hdmf_attrs – whether to ignore testing equality of HDMF container attributes, such as container_source and object_id
ignore_string_to_byte – ignore conversion of str to bytes and compare as unicode instead
message – custom additional message to show when assertions as part of this assert are failing
- assertBuilderEqual(builder1, builder2, check_path=True, check_source=True, message=None)¶
Test whether two builders are equal. Like assertDictEqual but also checks type, name, path, and source.
- Parameters:
builder1 (Builder) – The first builder
builder2 (Builder) – The second builder
check_path (bool) – Check that the builder.path values are equal
check_source (bool) – Check that the builder.source values are equal
message (str or None) – Custom message to add when any asserts as part of this assert are failing (default=None)
- class hdmf.testing.testcase.H5RoundTripMixin¶
Bases:
object
Mixin class for methods to run a roundtrip test writing a container to and reading the container from an HDF5 file. The setUp, test_roundtrip, and tearDown methods will be run by unittest.
The abstract method setUpContainer needs to be implemented by classes that include this mixin.
Example:
class TestMyContainerRoundTrip(H5RoundTripMixin, TestCase): def setUpContainer(self): # return the Container to read/write
NOTE: This class is a mix-in and not a subclass of TestCase so that unittest does not discover it, try to run it, and skip it.
- setUp()¶
- tearDown()¶
- abstract setUpContainer()¶
Return the Container to read/write.
- test_roundtrip()¶
Test whether the container read from a written file is the same as the original file.
- test_roundtrip_export()¶
Test whether the container read from a written and then exported file is the same as the original file.
- roundtripContainer(cache_spec=False)¶
Write the container to an HDF5 file, read the container from the file, and return it.
- roundtripExportContainer(cache_spec=False)¶
Write the container to an HDF5 file, read it, export it to a new file, read that file, and return it.
- validate(experimental=False)¶
Validate the written and exported files, if they exist.
hdmf.testing.utils module¶
- hdmf.testing.utils.remove_test_file(path)¶
A helper function for removing intermediate test files
This checks if the environment variable CLEAN_HDMF has been set to False before removing the file. If CLEAN_HDMF is set to False, it does not remove the file.
hdmf.testing.validate_spec module¶
- hdmf.testing.validate_spec.validate_spec(fpath_spec, fpath_schema)¶
Validate a yaml specification file against the json schema file that defines the specification language. Can be used to validate changes to the NWB and HDMF core schemas, as well as any extensions to either.
- Parameters:
fpath_spec – path-like
fpath_schema – path-like
- hdmf.testing.validate_spec.main()¶
Module contents¶
hdmf package¶
Subpackages¶
Submodules¶
hdmf.array module¶
- class hdmf.array.AbstractSortedArray(data)¶
Bases:
Array
An abstract class for representing sorted array
- abstract find_point(val)¶
- get_data()¶
- class hdmf.array.SortedArray(array)¶
Bases:
AbstractSortedArray
A class for wrapping sorted arrays. This class overrides <,>,<=,>=,==, and != to leverage the sorted content for efficiency.
- find_point(val)¶
- class hdmf.array.LinSpace(start, stop, step)¶
Bases:
SortedArray
- find_point(val)¶
hdmf.monitor module¶
- class hdmf.monitor.DataChunkProcessor(data)¶
Bases:
AbstractDataChunkIterator
Initialize the DataChunkIterator
- Parameters:
data (
DataChunkIterator
) – the DataChunkIterator to analyze
- recommended_chunk_shape()¶
Recommend the chunk shape for the data array.
- Returns:
NumPy-style shape tuple describing the recommended shape for the chunks of the target array or None. This may or may not be the same as the shape of the chunks returned in the iteration process.
- recommended_data_shape()¶
Recommend the initial shape for the data array.
This is useful in particular to avoid repeated resized of the target array when reading from this data iterator. This should typically be either the final size of the array or the known minimal shape of the array.
- Returns:
NumPy-style shape tuple indicating the recommended initial shape for the target array. This may or may not be the final full shape of the array, i.e., the array is allowed to grow. This should not be None.
- get_final_result(**kwargs)¶
Return the result of processing data fed by this DataChunkIterator
- abstract process_data_chunk(data_chunk)¶
- This method should take in a DataChunk,
and process it.
- Parameters:
data_chunk (
DataChunk
) – a chunk to process
- abstract compute_final_result()¶
- Return the result of processing this stream
Should raise NotYetExhaused exception
- class hdmf.monitor.NumSampleCounter(data)¶
Bases:
DataChunkProcessor
Initialize the DataChunkIterator
- Parameters:
data (
DataChunkIterator
) – the DataChunkIterator to analyze
- compute_final_result()¶
hdmf.query module¶
- class hdmf.query.HDMFDataset(dataset)¶
Bases:
object
- Parameters:
dataset (
ndarray
orlist
ortuple
orDataset
orArray
orStrDataset
orArray
) – the HDF5 file lazily evaluate
- __getitem__(key)¶
- property dataset¶
- property dtype¶
- next()¶
- class hdmf.query.ReferenceResolver¶
Bases:
object
A base class for classes that resolve references
- abstract classmethod get_inverse_class()¶
Return the class the represents the ReferenceResolver that resolves references to the opposite type.
BuilderResolver.get_inverse_class should return a class that subclasses ContainerResolver.
ContainerResolver.get_inverse_class should return a class that subclasses BuilderResolver.
- abstract invert()¶
Return an object that defers reference resolution but in the opposite direction.
- class hdmf.query.BuilderResolver¶
Bases:
ReferenceResolver
A reference resolver that resolves references to Builders
Subclasses should implement the invert method and the get_inverse_class classmethod
BuilderResolver.get_inverse_class should return a class that subclasses ContainerResolver.
- class hdmf.query.ContainerResolver¶
Bases:
ReferenceResolver
A reference resolver that resolves references to Containers
Subclasses should implement the invert method and the get_inverse_class classmethod
ContainerResolver.get_inverse_class should return a class that subclasses BuilderResolver.
hdmf.region module¶
- class hdmf.region.RegionSlicer(target, slice)¶
Bases:
DataRegion
A abstract base class to control getting using a region
Subclasses must implement __getitem__ and __len__
- Parameters:
target (None) – the target to slice
slice (None) – the region to slice
- property data¶
The target data. Same as self.target
- property region¶
The selected region. Same as self.slice
- property target¶
The target data
- property slice¶
The selected slice
- abstract property __getitem__¶
Must be implemented by subclasses
- abstract property __len__¶
Must be implemented by subclasses
- class hdmf.region.ListSlicer(dataset, region)¶
Bases:
RegionSlicer
Implementation of RegionSlicer for slicing Lists and Data
- Parameters:
- __getitem__(idx)¶
Get data values from selected data
hdmf.term_set module¶
- class hdmf.term_set.TermSet(term_schema_path: str | None = None, schemasheets_folder: str | None = None, dynamic: bool = False)¶
Bases:
object
Class for implementing term sets from ontologies and other resources used to define the meaning and/or identify of terms.
- Variables:
term_schema_path – The path to the LinkML YAML enumeration schema
sources – The prefixes for the ontologies used in the TermSet
view – SchemaView of the term set schema
schemasheets_folder – The path to the folder containing the LinkML TSV files
expanded_termset_path – The path to the schema with the expanded enumerations
- Parameters:
term_schema_path – The path to the LinkML YAML enumeration schema
schemasheets_folder – The path to the folder containing the LinkML TSV files
dynamic – Boolean parameter denoting whether the schema uses Dynamic Enumerations
- validate(term)¶
Validate term in dataset towards a termset.
- Parameters:
term (
str
) – term to be validated
- property view_set¶
Property method to return a view of all terms in the the LinkML YAML Schema.
- __getitem__(term)¶
Method to retrieve a term and term information (LinkML description and LinkML meaning) from the set of terms.
- class hdmf.term_set.TermSetWrapper(termset, value, field=None)¶
Bases:
object
This class allows any HDF5 dataset or attribute to have a TermSet.
- Parameters:
- property value¶
- property field¶
- property termset¶
- property dtype¶
- __getitem__(val)¶
This is used when we want to index items.
- append(arg)¶
This append resolves the wrapper to use the append of the container using the wrapper.
- extend(arg)¶
This append resolves the wrapper to use the extend of the container using the wrapper.
- class hdmf.term_set.TypeConfigurator(path=None)¶
Bases:
object
This class allows users to toggle on/off a global configuration for defined data types. When toggled on, every instance of a configuration file supported data type will be validated according to the corresponding TermSet.
- Parameters:
path (
str
) – Path to the configuration file.
- get_config(data_type, namespace)¶
Return the config for that data type in the given namespace.
- load_type_config(config_path)¶
Load the configuration file for validation on the fields defined for the objects within the file.
- Parameters:
config_path (
str
) – Path to the configuration file.
- unload_type_config()¶
Remove validation according to termset configuration file.
Module contents¶
- hdmf.get_region_slicer(dataset, region)¶
- Parameters:
dataset (None) – the HDF5 dataset to slice
region (None) – the region reference to use to slice
Extending Standards¶
The following page will discuss how to extend a standard using HDMF.
Creating new Extensions¶
Standards specified using HDMF are designed to be extended. Extension for a standard can be done so using classes
provided in the hdmf.spec
module. The classes GroupSpec
,
DatasetSpec
, AttributeSpec
, and LinkSpec
can be used to define custom types.
Attribute Specifications¶
Specifying attributes is done with AttributeSpec
.
from hdmf.spec import AttributeSpec
spec = AttributeSpec('bar', 'a value for bar', 'float')
Dataset Specifications¶
Specifying datasets is done with DatasetSpec
.
from hdmf.spec import DatasetSpec
spec = DatasetSpec('A custom data type',
name='qux',
attribute=[
AttributeSpec('baz', 'a value for baz', 'str'),
],
shape=(None, None))
Using datasets to specify tables¶
Tables can be specified using DtypeSpec
. To specify a table, provide a
list of DtypeSpec
objects to the dtype argument.
from hdmf.spec import DatasetSpec, DtypeSpec
spec = DatasetSpec('A custom data type',
name='qux',
attribute=[
AttributeSpec('baz', 'a value for baz', 'str'),
],
dtype=[
DtypeSpec('foo', 'column for foo', 'int'),
DtypeSpec('bar', 'a column for bar', 'float')
])
Group Specifications¶
Specifying groups is done with the GroupSpec
class.
from hdmf.spec import GroupSpec
spec = GroupSpec('A custom data type',
name='quux',
attributes=[...],
datasets=[...],
groups=[...])
Data Type Specifications¶
GroupSpec
and DatasetSpec
use the arguments data_type_inc and
data_type_def for declaring new types and extending existing types. New types are specified by setting the argument
data_type_def. New types can extend an existing type by specifying the argument data_type_inc.
Create a new type
from hdmf.spec import GroupSpec
# A list of AttributeSpec objects to specify new attributes
addl_attributes = [...]
# A list of DatasetSpec objects to specify new datasets
addl_datasets = [...]
# A list of DatasetSpec objects to specify new groups
addl_groups = [...]
spec = GroupSpec('A custom data type',
attributes=addl_attributes,
datasets=addl_datasets,
groups=addl_groups,
data_type_def='MyNewType')
Extend an existing type
from hdmf.spec import GroupSpec
# A list of AttributeSpec objects to specify additional attributes or attributes to be overridden
addl_attributes = [...]
# A list of DatasetSpec objects to specify additional datasets or datasets to be overridden
addl_datasets = [...]
# A list of GroupSpec objects to specify additional groups or groups to be overridden
addl_groups = [...]
spec = GroupSpec('An extended data type',
attributes=addl_attributes,
datasets=addl_datasets,
groups=addl_groups,
data_type_inc='SpikeEventSeries',
data_type_def='MyExtendedSpikeEventSeries')
Existing types can be instantiated by specifying data_type_inc alone.
from hdmf.spec import GroupSpec
# use another GroupSpec object to specify that a group of type
# ElectricalSeries should be present in the new type defined below
addl_groups = [ GroupSpec('An included ElectricalSeries instance',
data_type_inc='ElectricalSeries') ]
spec = GroupSpec('An extended data type',
groups=addl_groups,
data_type_inc='SpikeEventSeries',
data_type_def='MyExtendedSpikeEventSeries')
Datasets can be extended in the same manner (with regard to data_type_inc and data_type_def,
by using the class DatasetSpec
.
Saving Extensions¶
Extensions are used by including them in a loaded namespace. Namespaces and extensions need to be saved to file
for downstream use. The class NamespaceBuilder
can be used to create new namespace and
specification files.
Create a new namespace with extensions
from hdmf.spec import GroupSpec, NamespaceBuilder
# create a builder for the namespace
ns_builder = NamespaceBuilder("Extension for use in my laboratory", "mylab", ...)
# create extensions
ext1 = GroupSpec('A custom SpikeEventSeries interface',
attributes=[...]
datasets=[...],
groups=[...],
data_type_inc='SpikeEventSeries',
data_type_def='MyExtendedSpikeEventSeries')
ext2 = GroupSpec('A custom EventDetection interface',
attributes=[...]
datasets=[...],
groups=[...],
data_type_inc='EventDetection',
data_type_def='MyExtendedEventDetection')
# add the extension
ext_source = 'mylab.specs.yaml'
ns_builder.add_spec(ext_source, ext1)
ns_builder.add_spec(ext_source, ext2)
# include an existing namespace - this will include all specifications in that namespace
ns_builder.include_namespace('collab_ns')
# save the namespace and extensions
ns_path = 'mylab.namespace.yaml'
ns_builder.export(ns_path)
Tip
Using the API to generate extensions (rather than writing YAML sources directly) helps avoid errors in the specification (e.g., due to missing required keys or invalid values) and ensure compliance of the extension definition with the HDMF specification language. It also helps with maintenance of extensions, e.g., if extensions have to be ported to newer versions of the specification language in the future.
Incorporating extensions¶
HDMF supports extending existing data types. Extensions must be registered with HDMF to be used for reading and writing of custom data types.
The following code demonstrates how to load custom namespaces.
from hdmf import load_namespaces
namespace_path = 'my_namespace.yaml'
load_namespaces(namespace_path)
Note
This will register all namespaces defined in the file 'my_namespace.yaml'
.
Container : Representing custom data¶
To read and write custom data, corresponding Container
classes must be associated with their
respective specifications. Container
classes are associated with their respective
specification using the decorator register_class
.
The following code demonstrates how to associate a specification with the Container
class
that represents it.
from hdmf.common import register_class
from hdmf.container import Container
@register_class('MyExtension', 'my_namespace')
class MyExtensionContainer(Container):
...
register_class
can also be used as a function.
from hdmf.common import register_class
from hdmf.container import Container
class MyExtensionContainer(Container):
...
register_class(data_type='MyExtension', namespace='my_namespace', container_cls=MyExtensionContainer)
If you do not have an Container
subclass to associate with your extension specification,
a dynamically created class is created by default.
To use the dynamic class, you will need to retrieve the class object using the function get_class
.
Once you have retrieved the class object, you can use it just like you would a statically defined class.
from hdmf.common import get_class
MyExtensionContainer = get_class('my_namespace', 'MyExtension')
my_ext_inst = MyExtensionContainer(...)
If using iPython, you can access documentation for the class’s constructor using the help command.
ObjectMapper : Customizing the mapping between Container and the Spec¶
If your Container
extension requires custom mapping of the
Container
class for reading and writing, you will need to implement and register a custom
ObjectMapper
.
ObjectMapper
extensions are registered with the decorator
register_map
.
from hdmf.common import register_map
from hdmf.build import ObjectMapper
@register_map(MyExtensionContainer)
class MyExtensionMapper(ObjectMapper)
...
register_map
can also be used as a function.
from hdmf.common import register_map
from hdmf.build import ObjectMapper
class MyExtensionMapper(ObjectMapper)
...
register_map(MyExtensionContainer, MyExtensionMapper)
Tip
ObjectMappers allow you to customize how objects in the spec are mapped to attributes of your Container in Python. This is useful, e.g., in cases where you want to customize the default mapping. For an overview of the concepts of containers, spec, builders, object mappers in HDMF see also Software Architecture
Documenting Extensions¶
Coming soon!
Further Reading¶
Specification Language: For a detailed overview of the specification language itself see https://hdmf-schema-language.readthedocs.io/en/latest/index.html
Building API Classes¶
After you have written an extension, you will need a Pythonic way to interact with the data model. To do this, you will need to write some classes that represent the data you defined in your specification extensions.
The hdmf.container
module defines two base classes that represent the primitive structures supported by
the schema. Data
represents datasets and Container
represents groups. See the classes in the :py:mod:hdmf.common package for examples.
The register_class function/decorator¶
When defining a class that represents a data_type (i.e. anything that has a data_type_def)
from your extension, you can tell HDMF which data_type it represents using the function
register_class
. This class can be called on its own, or used as a class decorator. The
first argument should be the data_type and the second argument should be the namespace name.
The following example demonstrates how to register a class as the Python class representation of the
data_type “MyContainer” from the namespace “my_ns”. The namespace must be loaded prior to the below code using
the load_namespaces
function.
from hdmf.common import register_class
from hdmf.container import Container
class MyContainer(Container):
...
register_class(data_type='MyContainer', namespace='my_ns', container_cls=MyContainer)
Alternatively, you can use register_class
as a decorator.
from hdmf.common import register_class
from hdmf.container import Container
@type_map.register_class('MyContainer', 'my_ns')
class MyContainer(Container):
...
register_class
is used with Data
the same way it is used with
Container
.
Export¶
Export is a new feature in HDMF 2.0. You can use export to take a container that was read from a file and write it to a different file, with or without modifications to the container in memory. The in-memory container being exported will be written to the exported file as if it was never read from a file.
To export a container, first read the container from a file, then create a new
HDF5IO
object for exporting the data, then call
export
on the
HDF5IO
object, passing in the IO object used to read the container
and optionally, the container itself, which may be modified in memory between reading and exporting.
For example:
with HDF5IO(self.read_path, manager=manager, mode='r') as read_io:
with HDF5IO(self.export_path, mode='w') as export_io:
export_io.export(src_io=read_io)
FAQ¶
Can I read a container from disk, modify it, and then export the modified container?¶
Yes, you can export the in-memory container after modifying it in memory. The modifications will appear in the exported file and not the read file.
If the modifications are removals or additions of containers, then no special action must be taken, as long as the container hierarchy is updated correspondingly.
If the modifications are changes to attributes, then
Container.set_modified()
must be called on the container before exporting.
with HDF5IO(self.read_path, manager=manager, mode='r') as read_io:
container = read_io.read()
# ... # modify container
container.set_modified() # this may be necessary if the modifications are changes to attributes
with HDF5IO(self.export_path, mode='w') as export_io:
export_io.export(src_io=read_io, container=container)
Note
Modifications to h5py.Dataset
objects act directly on the read file on disk.
Changes are applied immediately and do not require exporting or writing the file. If you want to modify a dataset
only in the new file, than you should replace the whole object with a new array holding the modified data. To
prevent unintentional changes to the source file, the source file should be opened with mode='r'
.
Can I export a newly instantiated container?¶
No, you can only export containers that have been read from a file. The src_io
argument is required in
HDMFIO.export
.
Can I read a container from disk and export only part of the container?¶
It depends. You can only export the root container from a file. To export the root container without certain other sub-containers in the hierarchy, you can remove those other containers before exporting. However, you cannot export only a sub-container of the container hierarchy.
Can I write a newly instantiated container to two different files?¶
HDMF does not allow you to write a container that was not read from a file to two different files. For example, if you instantiate container A and write it file 1 and then try to write it to file 2, an error will be raised. However, you can read container A from file 1 and then export it to file 2, with or without modifications to container A in memory.
What happens to links when I export?¶
The exported file will not contain any links to the original file.
All links (such as internal links (i.e., HDF5 soft links) and links to other files (i.e., HDF5 external links)) will be preserved in the exported file.
If a link to an h5py.Dataset
in another file is added to the in-memory container after
reading it from file and then exported, then by default, the export process will create an external link to the
existing h5py.Dataset
object. To instead copy the data from the
h5py.Dataset
in another
file to the exported file, pass the keyword argument write_args={'link_data': False}
to
HDF5IO.export
. This is similar to passing the keyword argument
link_data=False
to HDF5IO.write
when writing a file with a
copy of externally linked datasets.
What happens to references when I export?¶
References will be preserved in the exported file. NOTE: Exporting a file involves loading into memory all datasets that contain references and attributes that are references. The HDF5 reference IDs within an exported file may differ from the reference IDs in the original file.
What happens to object IDs when I export?¶
After exporting a container, the object IDs of the container and its child containers will be identical to the object IDs of the read container and its child containers. The object ID of a container uniquely identifies the container within a file, but should not be used to distinguish between two different files.
If you would like all object IDs to change on export, then first call the method
generate_new_id
on the root container to generate
a new set of IDs for the root container and all of its children, recursively. Then export the container with its
new IDs. Note: calling the generate_new_id
method
changes the object IDs of the containers in memory. These changes are not reflected in the original file from
which the containers were read unless the HDF5IO.write
method is subsequently called.
with HDF5IO(self.read_path, manager=manager, mode='r') as read_io:
container = read_io.read()
container.generate_new_id()
with HDF5IO(self.export_path, mode='w') as export_io:
export_io.export(src_io=read_io, container=container)
Validating HDMF Data¶
Validation of NWB files is available through pynwb
. See the PyNWB documentation for more information.
Note
A simple interface for validating HDMF structured data through the command line like for PyNWB files is not yet
implemented. If you would like this functionality to be available through hdmf
, then please upvote
this issue.
Support for the HDMF Specification Language¶
The HDMF API provides nearly full support for all features of the HDMF Specification Language version 3.0.0, except for the following:
Attributes containing multiple references (see #833)
Certain text and integer values for quantity (see #423, #531)
Datasets that do not have a data_type_inc/data_type_def and contain either a reference dtype or a compound dtype (see #737)
Passing dataset dtype and shape from parent data type to child data type (see #320)
Installing HDMF for Developers¶
Set up a virtual environment¶
For development, we recommend installing HDMF in a virtual environment in editable mode. You can use the venv tool that comes packaged with Python to create a new virtual environment. Or you can use the conda package and environment management system for managing virtual environments.
Option 1: Using venv¶
First, create a new virtual environment using the venv
tool. This
virtual environment will be stored in a new directory called "hdmf-env"
in the current directory.
venv hdmf-env
On macOS or Linux, run the following to activate your new virtual environment:
source hdmf-env/bin/activate
On Windows, run the following to activate your new virtual environment:
hdmf-env\Scripts\activate
This virtual environment is a space where you can install Python packages that are isolated from other virtual environments. This is especially useful when working on multiple Python projects that have different package requirements and for testing Python code with different sets of installed packages or versions of Python.
Activate your newly created virtual environment using the above command whenever you want to work on HDMF. You can also
deactivate it using the deactivate
command to return to the base environment. And you can delete the virtual
environment by deleting the directory that was created.
Option 2: Using conda¶
The conda package and environment management system is an alternate way of managing virtual environments.
First, install Anaconda to install the conda
tool. Then create and
activate a new virtual environment called "hdmf-env"
with Python 3.12 installed.
conda create --name hdmf-env python=3.12
conda activate hdmf-env
Similar to a virtual environment created with venv
, a conda environment
is a space where you can install Python packages that are isolated from other virtual
environments. In general, you should use conda install
instead of pip install
to install packages
in a conda environment.
Activate your newly created virtual environment using the above command whenever you want to work on HDMF. You can also
deactivate it using the conda deactivate
command to return to the base environment. And you can delete the virtual
environment by using the conda remove --name hdmf-venv --all
command.
Note
For advanced users, we recommend using Mambaforge, a faster version of the conda package manager that includes conda-forge as a default channel.
Install from GitHub¶
After you have created and activated a virtual environment, clone the HDMF git repository from GitHub, install the package requirements using the pip Python package manager, and install HDMF in editable mode.
git clone --recurse-submodules https://github.com/hdmf-dev/hdmf.git
cd hdmf
pip install -r requirements.txt -r requirements-dev.txt -r requirements-doc.txt -r requirements-opt.txt
pip install -e .
Note
When using conda
, you may use pip install
to install dependencies as shown above; however, it is generally
recommended that dependencies should be installed via conda install
.
Run tests¶
You can run the full test suite by running:
pytest
This will run all the tests and compute the test coverage. The coverage report can be found in /htmlcov
.
You can also run a specific test module or class, or you can configure pytest
to start the
Python debugger (PDB) prompt on an error, e.g.,
pytest tests/unit/test_container.py # run all tests in the module
pytest tests/unit/test_container.py::TestContainer # run all tests in this class
pytest tests/unit/test_container.py::TestContainer::test_constructor # run this test method
pytest --pdb tests/unit/test_container.py # start pdb on error
You can run tests across multiple Python versions using the tox automated testing tool. Running tox
will
create a virtual environment, install dependencies, and run the test suite for different versions of Python.
This can take some time to run.
tox
You can also test that the Sphinx Gallery files run without warnings or errors by running:
python test_gallery.py
Install latest pre-release¶
To try out the latest features or set up continuous integration of your own project against the latest version of HDMF, install the latest release from GitHub.
pip install -U hdmf --find-links https://github.com/hdmf-dev/hdmf/releases/tag/latest --no-index
Contributing Guide¶
Code of Conduct¶
This project and everyone participating in it is governed by our code of conduct guidelines. By participating, you are expected to uphold this code. Please report unacceptable behavior.
Types of Contributions¶
Did you find a bug? or Do you intend to add a new feature or change an existing one?¶
Submit issues and requests using our issue tracker
Ensure the feature or change was not already reported by searching on GitHub under HDMF Issues
If you are unable to find an open issue addressing the problem then open a new issue on the respective repository. Be sure to use our issue templates and include:
brief and descriptive title
clear description of the problem you are trying to solve. Describing the use case is often more important than proposing a specific solution. By describing the use case and problem you are trying to solve gives the development team community a better understanding for the reasons of changes and enables others to suggest solutions.
context providing as much relevant information as possible and if available a code sample or an executable test case demonstrating the expected behavior and/or problem.
Be sure to select the appropriate label (bug report or feature request) for your tickets so that they can be processed accordingly.
HDMF is currently being developed primarily by staff at scientific research institutions and industry, most of which work on many different research projects. Please be patient, if our development team is not able to respond immediately to your issues. In particular issues that belong to later project milestones may not be reviewed or processed until work on that milestone begins.
Did you write a patch that fixes a bug or implements a new feature?¶
See the Contributing Patches and Changes section below for details.
Did you fix whitespace, format code, or make a purely cosmetic patch in source code?¶
Source code changes that are purely cosmetic in nature and do not add anything substantial to the stability, functionality, or testability will generally not be accepted unless they have been approved beforehand. One of the main reasons is that there are a lot of hidden costs in addition to writing the code itself, and with the limited resources of the project, we need to optimize developer time. E.g,. someone needs to test and review PRs, backporting of bug fixes gets harder, it creates noise and pollutes the git repo and many other cost factors.
Do you have questions about HDMF?¶
See our hdmf-dev.github.io website for details.
Informal discussions between developers and users?¶
The https://nwb-users.slack.com slack is currently used for informal discussions between developers and users.
Contributing Patches and Changes¶
To contribute to HDMF you must submit your changes to the dev
branch via a Pull Request.
From your local copy directory, use the following commands.
First create a new branch to work on
$ git checkout -b <new_branch>
Make your changes.
Push your feature branch to origin (i.e. GitHub)
$ git push origin <new_branch>
Once you have tested and finalized your changes, create a pull request targeting
dev
as the base branch. Be sure to use our pull request template and:Ensure the PR description clearly describes the problem and solution.
Include the relevant issue number if applicable.
Before submitting, please ensure that: * The proposed changes include an addition to
CHANGELOG.md
describing your changes. To label the change with the PR number, you will have to first create the PR, then edit theCHANGELOG.md
with the PR number, and push that change. * The code follows our coding style. This can be checked runningruff
from the source directory.NOTE: Contributed branches will be removed by the development team after the merge is complete and should, hence, not be used after the pull request is complete.
Style Guides¶
Python Code Style Guide¶
Before you create a Pull Request, make sure you are following the HDMF style guide.
To check whether your code conforms to the HDMF style guide, simply run the ruff tool in the project’s root
directory. ruff
will also sort imports automatically and check against additional code style rules.
We also use ruff
to sort python imports automatically and double-check that the codebase
conforms to PEP8 standards, while using the codespell tool to check spelling.
ruff
and codespell
are installed when you follow the developer installation instructions. See
Installing HDMF for Developers.
$ ruff check .
$ codespell
Pre-Commit¶
We encourage developers to use pre-commit tool to automatically process the codebase to follow the style guide, as well as identify issues before making a commit. See installation and operation instructions in the pre-commit documentation.
Git Commit Message Styleguide¶
Use the present tense (“Add feature” not “Added feature”)
The first should be short and descriptive.
Additional details may be included in further paragraphs.
If a commit fixes an issue, then include “Fix #X” where X is the number of the issue.
Reference relevant issues and pull requests liberally after the first line.
Documentation Styleguide¶
All documentations is written in reStructuredText (RST) using Sphinx.
Endorsement¶
Please do not take working with an organization (e.g., during a hackathon or via GitHub) as an endorsement of your work or your organization. It’s okay to say e.g., “We worked with XXXXX to advance science” but not e.g., “XXXXX supports our work on HDMF”.”
License and Copyright¶
See the license files for details about the copyright and license.
As indicated in the HDMF license: “You are under no obligation whatsoever to provide any bug fixes, patches, or upgrades to the features, functionality or performance of the source code (“Enhancements”) to anyone; however, if you choose to make your Enhancements available either publicly, or directly to Lawrence Berkeley National Laboratory, without imposing a separate written license agreement for such Enhancements, then you hereby grant the following license: a non-exclusive, royalty-free perpetual license to install, use, modify, prepare derivative works, incorporate into other computer software, distribute, and sublicense such enhancements or derivative works thereof, in binary and source code form.”
Contributors to the HDMF code base are expected to use a permissive, non-copyleft open source license. Typically 3-clause BSD is used, but any compatible license is allowed, the MIT and Apache 2.0 licenses being good alternative choices. The GPL and other copyleft licenses are not allowed due to the consternation it generates across many organizations.
Also, make sure that you are permitted to contribute code. Some organizations, even academic organizations, have agreements in place that discuss IP ownership in detail (i.e., address IP rights and ownership that you create while under the employ of the organization). These are typically signed documents that you looked at on your first day of work and then promptly forgot. We don’t want contributed code to be yanked later due to IP issues.
How to Make a Roundtrip Test¶
The HDMF test suite has tools for easily doing round-trip tests of container classes. These
tools exist in the hdmf.testing
module. Round-trip tests exist for the container classes in the
hdmf.common
module. We recommend you write any additional round-trip tests in
the tests/unit/common
subdirectory of the Git repository.
For executing your new tests, we recommend using the test.py script in the top of the Git repository. Roundtrip tests will get executed as part of the full test suite, which can be executed with the following command:
$ python test.py
The roundtrip test will generate a new HDMF file with the name test_<CLASS_NAME>.h5
where CLASS_NAME
is
the class name of the container class you are roundtripping. The test
will write an HDMF file with an instance of the container to disk, read this instance back in, and compare it
to the instance that was used for writing to disk. Once the test is complete, the HDMF file will be deleted.
You can keep the HDMF file around after the test completes by setting the environment variable CLEAN_HDMF
to 0
, false
, False
, or FALSE
. Setting CLEAN_HDMF
to any value not listed here will
cause the roundtrip HDMF file to be deleted once the test has completed
Before writing tests, we also suggest you familiarize yourself with the software architecture of HDMF.
H5RoundTripMixin
¶
To write a roundtrip test, you will need to subclass the
H5RoundTripMixin
class and the
TestCase
class, in that order, and override some of the instance methods of the
H5RoundTripMixin
class to test the process of going from in-memory Python object
to data stored on disk and back.
setUpContainer
¶
To configure the test for a particular container class, you need to override the
setUpContainer
method. This method should take no arguments, and
return an instance of the container class you are testing.
Here is an example using a CSRMatrix
:
from hdmf.common import CSRMatrix
from hdmf.testing import TestCase, H5RoundTripMixin
import numpy as np
class TestCSRMatrixRoundTrip(H5RoundTripMixin, TestCase):
def setUpContainer(self):
data = np.array([1, 2, 3, 4, 5, 6])
indices = np.array([0, 2, 2, 0, 1, 2])
indptr = np.array([0, 2, 3, 6])
return CSRMatrix(data, indices, indptr, (3, 3))
Software Process¶
Continuous Integration¶
HDMF is tested against Ubuntu, macOS, and Windows operating systems. The project has both unit and integration tests. Tests run on GitHub Actions.
Each time a PR is created or updated, the project is built, packaged, and tested on all supported operating systems and python distributions. That way, as a contributor, you know if you introduced regressions or coding style inconsistencies.
There are badges in the README file which shows the current condition of the dev branch.
Coverage¶
Code coverage is computed and reported using the coverage tool. There are two coverage-related badges in the README file. One shows the status of the GitHub Action workflow which runs the coverage tool and uploads the report to codecov, and the other badge shows the percentage coverage reported from codecov. A detailed report can be found on codecov, which shows line by line which lines are covered by the tests.
Installation Requirements¶
pyproject.toml contains a list of package dependencies and their version ranges allowed for running HDMF. As a library, upper bound version constraints create more harm than good in the long term (see this blog post) so we avoid setting upper bounds on requirements.
If some of the packages are outdated, see How to Update Requirements Files.
Testing Requirements¶
There are several kinds of requirements files used for testing PyNWB.
The first one is requirements-min.txt, which lists the package dependencies and their minimum versions for installing HDMF.
The second one is requirements.txt, which lists the pinned (concrete) dependencies to reproduce an entire development environment to use HDMF.
The third one is requirements-dev.txt, which list the pinned (concrete) dependencies to reproduce an entire development environment to use HDMF, run HDMF tests, check code style, compute coverage, and create test environments.
The fourth one is requirements-opt.txt, which lists the pinned (concrete) optional dependencies to use all available features in HDMF.
The final one is environment-ros3.yml, which lists the dependencies used to test ROS3 streaming in HDMF.
Documentation Requirements¶
requirements-doc.txt lists the dependencies to generate the documentation for HDMF. Both this file and requirements.txt are used by ReadTheDocs to initialize the local environment for Sphinx to run.
Versioning and Releasing¶
HDMF uses setuptools_scm for versioning source and wheel distributions. setuptools_scm creates a semi-unique release
name for the wheels that are created based on git tags.
After all the tests pass, the “Deploy release” GitHub Actions workflow
creates both a wheel (\*.whl
) and source distribution (\*.tar.gz
) for Python 3
and uploads them back to GitHub as a release.
It is important to note that GitHub automatically generates source code archives in .zip
and .tar.gz
formats and
attaches those files to all releases as an asset. These files currently do not contain the submodules within HDMF and
thus do not serve as a complete installation. For a complete source code archive, use the source distribution generated
by GitHub Actions, typically named hdmf-{version}.tar.gz
.
How to Make a Release¶
A core developer should use the following steps to create a release X.Y.Z
of hdmf.
Note
Since the hdmf wheels do not include compiled code, they are considered pure and could be generated on any supported platform.
That said, considering the instructions below have been tested on a Linux system, they may have to be adapted to work on macOS or Windows.
Prerequisites¶
All CI tests are passing on GitHub Actions.
You have a GPG signing key.
Dependency versions in
requirements.txt
,requirements-dev.txt
,requirements-opt.txt
,requirements-doc.txt
, andrequirements-min.txt
are up-to-date.Legal information and copyright dates in
Legal.txt
,license.txt
,README.rst
,docs/source/conf.py
, and any other files are up-to-date.Package information in
setup.py
is up-to-date.README.rst
information is up-to-date.The
hdmf-common-schema
submodule is up-to-date. The version number should be checked manually in case syncing the git submodule does not work as expected.Documentation reflects any new features and changes in HDMF functionality.
Documentation builds locally.
Documentation builds on the ReadTheDocs project on the “dev” build.
Release notes have been prepared.
An appropriate new version number has been selected.
Documentation conventions¶
The commands reported below should be evaluated in the same terminal session.
Commands to evaluate starts with a dollar sign. For example:
$ echo "Hello"
Hello
means that echo "Hello"
should be copied and evaluated in the terminal.
Publish release on PyPI: Step-by-step¶
Make sure that all CI tests are passing on GitHub Actions.
List all tags sorted by version.
$ git tag -l | sort -V
Choose the next release version number and store it in a variable.
$ release=X.Y.ZWarning
To ensure the packages are uploaded on PyPI, tags must match this regular expression:
^[0-9]+.[0-9]+.[0-9]+$
.
Download the latest sources.
$ cd /tmp && git clone --recurse-submodules git@github.com:hdmf-dev/hdmf && cd hdmf
Tag the release.
$ git tag --sign -m "hdmf ${release}" ${release} origin/devWarning
This step requires a GPG signing key.
Publish the release tag.
$ git push origin ${release}Important
This will trigger the “Deploy release” GitHub Actions workflow which will automatically upload the wheels and source distribution to both the HDMF PyPI project page and a new GitHub release using the hdmf-bot account.
Check the status of the builds on GitHub Actions.
Once the builds are completed, check that the distributions are available on HDMF PyPI project page and that a new GitHub release was created.
Copy the release notes from
CHANGELOG.md
to the newly created GitHub release.Create a clean testing environment to test the installation.
On bash/zsh:
$ python -m venv hdmf-${release}-install-test && \ source hdmf-${release}-install-test/bin/activateOn other shells, see the Python instructions for creating a virtual environment.
Test the installation:
$ pip install hdmf && \ python -c "import hdmf; print(hdmf.__version__)"
Cleanup
On bash/zsh:
$ deactivate && \ rm -rf dist/* && \ rm -rf hdmf-${release}-install-test
Publish release on conda-forge: Step-by-step¶
Warning
Publishing on conda requires you to have the corresponding package version uploaded on PyPI. So you have to do the PyPI and GitHub release before you do the conda release.
Note
Conda-forge maintains a bot called “regro-cf-autotick-bot” that regularly monitors PyPI for new releases of
packages that are also on conda-forge. When a new release is detected, usually within 24 hours of publishing
on PyPI, the bot will create a Pull Request with the correct modifications to the version and sha256 values
in meta.yaml
. If the requirements in setup.py
have been changed, then you need to modify the
requirements/run section in meta.yaml
manually to reflect these changes. Once tests pass, merge the PR,
and a new release will be published on Anaconda cloud. This is the easiest way to update the package version
on conda-forge.
In order to release a new version on conda-forge manually, follow the steps below:
Store the release version string (this should match the PyPI version that you already published).
$ release=X.Y.Z
Fork the hdmf-feedstock repository to your GitHub user account.
Clone the forked feedstock to your local filesystem.
Fill the YOURGITHUBUSER part.
$ cd /tmp && git clone https://github.com/YOURGITHUBUSER/hdmf-feedstock.git
Download the corresponding source for the release version.
$ cd /tmp && \ wget https://github.com/hdmf-dev/hdmf/releases/download/$release/hdmf-$release.tar.gz
Create a new branch.
$ cd hdmf-feedstock && \ git checkout -b $release
Modify
meta.yaml
.Update the version string (line 2) and sha256 (line 3).
We have to modify the sha and the version string in the
meta.yaml
file.For linux flavors:
$ sed -i "2s/.*/{% set version = \"$release\" %}/" recipe/meta.yaml $ sha=$(openssl sha256 /tmp/hdmf-$release.tar.gz | awk '{print $2}') $ sed -i "3s/.*/{$ set sha256 = \"$sha\" %}/" recipe/meta.yaml
For macOS:
$ sed -i -- "2s/.*/{% set version = \"$release\" %}/" recipe/meta.yaml $ sha=$(openssl sha256 /tmp/hdmf-$release.tar.gz | awk '{print $2}') $ sed -i -- "3s/.*/{$ set sha256 = \"$sha\" %}/" recipe/meta.yaml
If the requirements in
setup.py
have been changed, then modify the requirements/run list in themeta.yaml
file to reflect these changes.
Push the changes to your fork.
$ git push origin $release
Create a Pull Request.
Create a pull request against the main feedstock repository. After the tests pass, merge the PR, and a new release will be published on Anaconda cloud.
How to Update Requirements Files¶
The different requirements files introduced in Software Process section are the following:
requirements.txt¶
requirements.txt of the project can be created or updated and then captured using the following script:
mkvirtualenv hdmf-requirements
cd hdmf
pip install .
pip check # check for package conflicts
pip freeze > requirements.txt
deactivate
rmvirtualenv hdmf-requirements
requirements-(dev|doc|opt).txt¶
Any of these requirements files can be updated using the following scripts:
cd hdmf
# Set the requirements file to update
target_requirements=requirements-dev.txt
mkvirtualenv hdmf-requirements
# Install updated requirements
pip install -U -r $target_requirements
# If relevant, you could pip install new requirements now
# pip install -U <name-of-new-requirement>
# Check for any conflicts in installed packages
pip check
# Update list of pinned requirements
pip freeze > $target_requirements
deactivate
rmvirtualenv hdmf-requirements
requirements-min.txt¶
Minimum requirements should be updated manually if a new feature or bug fix is added in a dependency that is required for proper running of HDMF. Minimum requirements should also be updated if a user requests that HDMF be installable with an older version of a dependency, all tests pass using the older version, and there is no valid reason for the minimum version to be as high as it is.
Copyright¶
“hdmf” Copyright (c) 2017-2024, The Regents of the University of California, through Lawrence Berkeley National Laboratory (subject to receipt of any required approvals from the U.S. Dept. of Energy). All rights reserved.
If you have questions about your rights to use or distribute this software, please contact Berkeley Lab’s Innovation & Partnerships Office at IPO@lbl.gov.
NOTICE. This Software was developed under funding from the U.S. Department of Energy and the U.S. Government consequently retains certain rights. As such, the U.S. Government has been granted for itself and others acting on its behalf a paid-up, nonexclusive, irrevocable, worldwide license in the Software to reproduce, distribute copies to the public, prepare derivative works, and perform publicly and display publicly, and to permit other to do so.
License¶
“hdmf” Copyright (c) 2017-2024, The Regents of the University of California, through Lawrence Berkeley National Laboratory (subject to receipt of any required approvals from the U.S. Dept. of Energy). All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
Neither the name of the University of California, Lawrence Berkeley National Laboratory, U.S. Dept. of Energy nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
You are under no obligation whatsoever to provide any bug fixes, patches, or upgrades to the features, functionality or performance of the source code (“Enhancements”) to anyone; however, if you choose to make your Enhancements available either publicly, or directly to Lawrence Berkeley National Laboratory, without imposing a separate written license agreement for such Enhancements, then you hereby grant the following license: a non-exclusive, royalty-free perpetual license to install, use, modify, prepare derivative works, incorporate into other computer software, distribute, and sublicense such enhancements or derivative works thereof, in binary and source code form.