hdmf.common.resources module¶
- class hdmf.common.resources.KeyTable(name='keys', data=[])¶
Bases:
Table
A table for storing keys used to reference external resources.
- Parameters
name (
str
) – the name of this tabledata (
ndarray
orlist
ortuple
orDataset
orStrDataset
orHDMFDataset
orAbstractDataChunkIterator
orDataIO
) – the data in this table
- class hdmf.common.resources.Key(key, table=None, idx=None)¶
Bases:
Row
A Row class for representing rows in the KeyTable.
- Parameters
- todict()¶
- class hdmf.common.resources.ResourceTable(name='resources', data=[])¶
Bases:
Table
A table for storing names and URIs of ontology sources.
- Parameters
name (
str
) – the name of this tabledata (
ndarray
orlist
ortuple
orDataset
orStrDataset
orHDMFDataset
orAbstractDataChunkIterator
orDataIO
) – the data in this table
- class hdmf.common.resources.Resource(resource, resource_uri, table=None, idx=None)¶
Bases:
Row
A Row class for representing rows in the ResourceTable.
- Parameters
- todict()¶
- class hdmf.common.resources.EntityTable(name='entities', data=[])¶
Bases:
Table
A table for storing the external resources a key refers to.
- Parameters
name (
str
) – the name of this tabledata (
ndarray
orlist
ortuple
orDataset
orStrDataset
orHDMFDataset
orAbstractDataChunkIterator
orDataIO
) – the data in this table
- add_row(keys_idx, resources_idx, entity_id, entity_uri)¶
- Parameters
keys_idx (
int
orKey
) – The index into the keys table for the user key that maps to the resource term / registry symbol.resources_idx (
int
orResource
) – The index into the ResourceTable.entity_id (
str
) – The unique ID for the resource term / registry symbol.entity_uri (
str
) – The URI for the resource term / registry symbol.
- class hdmf.common.resources.Entity(keys_idx, resources_idx, entity_id, entity_uri, table=None, idx=None)¶
Bases:
Row
A Row class for representing rows in the EntityTable.
- Parameters
keys_idx (
int
orKey
) – The index into the keys table for the user key that maps to the resource term / registry symbol.resources_idx (
int
orResource
) – The index into the ResourceTable.entity_id (
str
) – The unique ID for the resource term / registry symbol.entity_uri (
str
) – The URI for the resource term / registry symbol.table (
Table
) – Noneidx (
int
) – None
- todict()¶
- class hdmf.common.resources.ObjectTable(name='objects', data=[])¶
Bases:
Table
A table for storing objects (i.e. Containers) that contain keys that refer to external resources.
- Parameters
name (
str
) – the name of this tabledata (
ndarray
orlist
ortuple
orDataset
orStrDataset
orHDMFDataset
orAbstractDataChunkIterator
orDataIO
) – the data in this table
- add_row(object_id, relative_path, field)¶
- Parameters
object_id (
str
) – The object ID for the Container/Data.relative_path (
str
) – (‘The relative_path of the attribute of the object that uses ‘, ‘an external resource reference key. Use an empty string if not applicable.’)field (
str
) – The field of the compound data type using an external resource. Use an empty string if not applicable.
- class hdmf.common.resources.Object(object_id, relative_path, field, table=None, idx=None)¶
Bases:
Row
A Row class for representing rows in the ObjectTable.
- Parameters
object_id (
str
) – The object ID for the Container/Data.relative_path (
str
) – (‘The relative_path of the attribute of the object that uses ‘, ‘an external resource reference key. Use an empty string if not applicable.’)field (
str
) – The field of the compound data type using an external resource. Use an empty string if not applicable.table (
Table
) – Noneidx (
int
) – None
- todict()¶
- class hdmf.common.resources.ObjectKeyTable(name='object_keys', data=[])¶
Bases:
Table
A table for identifying which keys are used by which objects for referring to external resources.
- Parameters
name (
str
) – the name of this tabledata (
ndarray
orlist
ortuple
orDataset
orStrDataset
orHDMFDataset
orAbstractDataChunkIterator
orDataIO
) – the data in this table
- class hdmf.common.resources.ObjectKey(objects_idx, keys_idx, table=None, idx=None)¶
Bases:
Row
A Row class for representing rows in the ObjectKeyTable.
- Parameters
- todict()¶
- class hdmf.common.resources.ExternalResources(name, keys=None, resources=None, entities=None, objects=None, object_keys=None, type_map=None)¶
Bases:
Container
A table for mapping user terms (i.e. keys) to resource entities.
- Parameters
name (
str
) – The name of this ExternalResources container.keys (
KeyTable
) – The table storing user keys for referencing resources.resources (
ResourceTable
) – The table for storing names and URIs of resources.entities (
EntityTable
) – The table storing entity information.objects (
ObjectTable
) – The table storing object information.object_keys (
ObjectKeyTable
) – The table storing object-resource relationships.type_map (
TypeMap
) – The type map. If None is provided, the HDMF-common type map will be used.
- property keys¶
The table storing user keys for referencing resources.
- property resources¶
The table for storing names and URIs of resources.
- property entities¶
The table storing entity information.
- property objects¶
The table storing object information.
- property object_keys¶
The table storing object-resource relationships.
- static assert_external_resources_equal(left, right, check_dtype=True)¶
Compare that the keys, resources, entities, objects, and object_keys tables match
- Parameters
left – ExternalResources object to compare with right
right – ExternalResources object to compare with left
check_dtype – Enforce strict checking of dtypes. Dtypes may be different for example for ids, where depending on how the data was saved ids may change from int64 to int32. (Default: True)
- Returns
The function returns True if all values match. If mismatches are found, AssertionError will be raised.
- Raises
AssertionError – Raised if any differences are found. The function collects all differences into a single error so that the assertion will indicate all found differences.
- get_key(key_name, container=None, relative_path='', field='')¶
Return a Key or a list of Key objects that correspond to the given key.
If container, relative_path, and field are provided, the Key that corresponds to the given name of the key for the given container, relative_path, and field is returned.
- Parameters
key_name (
str
) – The name of the Key to get.container (
str
orAbstractContainer
) – The Container/Data object that uses the key or the object id for the Container/Data object that uses the key.relative_path (
str
) – (‘The relative_path of the attribute of the object that uses ‘, ‘an external resource reference key. Use an empty string if not applicable.’)field (
str
) – The field of the compound data type using an external resource.
- get_resource(resource_name)¶
Retrieve resource object with the given resource_name.
- Parameters
resource_name (
str
) – The name of the resource.
- add_ref(container=None, attribute=None, field='', key=None, resources_idx=None, resource_name=None, resource_uri=None, entity_id=None, entity_uri=None)¶
Add information about an external reference used in this file.
It is possible to use the same name of the key to refer to different resources so long as the name of the key is not used within the same object, relative_path, and field combination. This method does not support such functionality by default.
- Parameters
container (
str
orAbstractContainer
) – The Container/Data object that uses the key or the object_id for the Container/Data object that uses the key.attribute (
str
) – The attribute of the container for the external reference.field (
str
) – The field of the compound data type using an external resource.key (
str
orKey
) – The name of the key or the Key object from the KeyTable for the key to add a resource for.resources_idx (
Resource
) – The Resource from the ResourceTable.resource_name (
str
) – The name of the resource to be created.resource_uri (
str
) – The URI of the resource to be created.entity_id (
str
) – The identifier for the entity at the resource.entity_uri (
str
) – The URI for the identifier at the resource.
- get_object_resources(container, relative_path='', field='')¶
Get all entities/resources associated with an object.
- Parameters
container (
str
orAbstractContainer
) – The Container/data object that is linked to resources/entities.relative_path (
str
) – (‘The relative_path of the attribute of the object that uses ‘, ‘an external resource reference key. Use an empty string if not applicable.’)field (
str
) – The field of the compound data type using an external resource.
- get_keys(keys=None)¶
- Return a DataFrame with information about keys used to make references to external resources.
- The DataFrame will contain the following columns:
key_name: the key that will be used for referencing an external resource
resources_idx: the index for the resourcetable
entity_id: the index for the entity at the external resource
entity_uri: the URI for the entity at the external resource
It is possible to use the same key_name to refer to different resources so long as the key_name is not used within the same object, relative_path, field. This method doesn’t support such functionality by default. To select specific keys, use the keys argument to pass in the Key object(s) representing the desired keys. Note, if the same key_name is used more than once, multiple calls to this method with different Key objects will be required to keep the different instances separate. If a single call is made, it is left up to the caller to distinguish the different instances.
- to_dataframe(use_categories=False)¶
- Convert the data from the keys, resources, entities, objects, and object_keys tables
to a single joint dataframe. I.e., here data is being denormalized, e.g., keys that are used across multiple enities or objects will duplicated across the corresponding rows.
Returns:
DataFrame
with all data merged into a single, flat, denormalized table.
- Parameters
use_categories (
bool
) – Use a multi-index on the columns to indicate which category each column belongs to.- Returns
A DataFrame with all data merged into a flat, denormalized table.
- Return type
DataFrame
- to_sqlite(db_file)¶
Save the keys, resources, entities, objects, and object_keys tables using sqlite3 to the given db_file.
The function will first create the tables (if they do not already exist) and then add the data from this ExternalResource object to the database. If the database file already exists, then the data will be appended as rows to the existing database tables.
Note, the index values of foreign keys (e.g., keys_idx, objects_idx, resources_idx) in the tables will not match between the ExternalResources here and the exported database, but they are adjusted automatically here, to ensure the foreign keys point to the correct rows in the exported database. This is because: 1) ExternalResources uses 0-based indexing for foreign keys, whereas SQLite uses 1-based indexing and 2) if data is appended to existing tables then a corresponding additional offset must be applied to the relevant foreign keys.
- raises
The function will raise errors if connection to the database fails. If the given db_file already exists, then there is also the possibility that certain updates may result in errors if there are collisions between the new and existing data.
- Parameters
db_file (
str
) – Name of the SQLite database file
- to_tsv(path)¶
- Write ExternalResources as a single, flat table to TSV
Internally, the function uses
pandas.DataFrame.to_csv
. Pandas can infer compression based on the filename, i.e., by changing the file extension to ‘.gz’, ‘.bz2’, ‘.zip’, ‘.xz’, or ‘.zst’ we can write compressed files. The TSV is formatted as follows: 1) line one indicates for each column the name of the table the column belongs to, 2) line two is the name of the column within the table, 3) subsequent lines are each a row in the flattened ExternalResources table. The first column is the row id in the flattened table and does not have a label, i.e., the first and second row will start with a tab character, and subseqent rows are numbered sequentially 1,2,3,… . For example:1 objects objects objects objects keys keys resources resources resources entities entities entities 2 objects_idx object_id relative_path field keys_idx key resources_idx resource resource_uri entities_idx entity_id entity_uri 30 0 1fc87200-e91e-45b3-978c-6d295af144c3 species 0 Mus musculus 0 NCBI_Taxonomy https://www.ncbi.nlm.nih.gov/taxonomy 0 NCBI:txid10090 https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=10090 41 0 9bf0c58e-09dc-4457-a652-94065b112c41 species 1 Homo sapiens 0 NCBI_Taxonomy https://www.ncbi.nlm.nih.gov/taxonomy 1 NCBI:txid9606 https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9606
See also
from_tsv
- Parameters
path (
str
) – path of the tsv file to write
- classmethod from_tsv(path)¶
- Read ExternalResources from a flat tsv file
Formatting of the TSV file is assumed to be consistent with the format generated by
to_tsv
. The function attempts to validate that the data in the TSV is consistent and parses the data from the denormalized table in the TSV to the normalized linked table structure used by ExternalResources. Currently the checks focus on ensuring that row id links between tables are valid. Inconsistencies in other (non-index) fields (e.g., when two rows with the same resource_idx have different resource_uri values) are not checked and will be ignored. In this case, the value from the first row that contains the corresponding entry will be kept.Note
Since TSV files may be edited by hand or other applications, it is possible that data in the TSV may be inconsistent. E.g., object_idx may be missing if rows were removed and ids not updated. Also since the TSV is flattened into a single denormalized table (i.e., data are stored with duplication, rather than normalized across several tables), it is possible that values may be inconsistent if edited outside. E.g., we may have objects with the same index (object_idx) but different object_id, relative_path, or field values. While flat TSVs are sometimes preferred for ease of sharing, editing the TSV without using the
ExternalResources
class should be done with great care!
- Parameters
path (
str
) – path of the tsv file to read- Returns
ExternalResources loaded from TSV
- Return type
- data_type = 'ExternalResources'¶
- namespace = 'hdmf-experimental'¶