.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "tutorials/plot_term_set.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_tutorials_plot_term_set.py: TermSet ======= This is a user guide for interacting with the :py:class:`~hdmf.term_set.TermSet` and :py:class:`~hdmf.term_set.TermSetWrapper` classes. The :py:class:`~hdmf.term_set.TermSet` and :py:class:`~hdmf.term_set.TermSetWrapper` types are experimental and are subject to change in future releases. If you use these types, please provide feedback to the HDMF team so that we can improve the structure and overall capabilities. Introduction ------------- The :py:class:`~hdmf.term_set.TermSet` class provides a way for users to create their own set of terms from brain atlases, species taxonomies, and anatomical, cell, and gene function ontologies. Users will be able to validate their data and attributes to their own set of terms, ensuring clean data to be used inline with the FAIR principles later on. The :py:class:`~hdmf.term_set.TermSet` class allows for a reusable and sharable pool of metadata to serve as references for any dataset or attribute. The :py:class:`~hdmf.term_set.TermSet` class is used closely with :py:class:`~hdmf.common.resources.HERD` to more efficiently map terms to data. In order to actually use a :py:class:`~hdmf.term_set.TermSet`, users will use the :py:class:`~hdmf.term_set.TermSetWrapper` to wrap data and attributes. The :py:class:`~hdmf.term_set.TermSetWrapper` uses a user-provided :py:class:`~hdmf.term_set.TermSet` to perform validation. :py:class:`~hdmf.term_set.TermSet` is built upon the resources from LinkML, a modeling language that uses YAML-based schema, giving :py:class:`~hdmf.term_set.TermSet` a standardized structure and a variety of tools to help the user manage their references. How to make a TermSet Schema ---------------------------- Before the user can take advantage of all the wonders within the :py:class:`~hdmf.term_set.TermSet` class, the user needs to create a LinkML schema (YAML) that provides all the permissible term values. Please refer to https://linkml.io/linkml/intro/tutorial06.html to learn more about how LinkML structures their schema. 1. The name of the schema is up to the user, e.g., the name could be "Species" if the term set will contain species terms. 2. The prefixes will be the standardized prefix of your source, followed by the URI to the terms. For example, the NCBI Taxonomy is abbreviated as NCBI_TAXON, and Ensemble is simply Ensemble. As mentioned prior, the URI needs to be to the terms; this is to allow the URI to later be coupled with the source id for the term to create a valid link to the term source page. 3. The schema uses LinkML enumerations to list all the possible terms. To define the all the permissible values, the user can define them manually in the schema, transfer them from a Google spreadsheet, or pull them into the schema dynamically from a LinkML supported source. For a clear example, please view the `example_term_set.yaml `_ for this tutorial, which provides a concise example of how a term set schema looks. .. note:: For more information regarding LinkML Enumerations, please refer to https://linkml.io/linkml/intro/tutorial06.html. .. note:: For more information on how to properly format the Google spreadsheet to be compatible with LinkMl, please refer to https://linkml.io/schemasheets/#examples. .. note:: For more information how to properly format the schema to support LinkML Dynamic Enumerations, please refer to https://linkml.io/linkml/schemas/enums.html#dynamic-enums. .. GENERATED FROM PYTHON SOURCE LINES 68-109 .. code-block:: Python from hdmf.common import DynamicTable, VectorData import os import numpy as np try: import linkml_runtime # noqa: F401 except ImportError as e: raise ImportError("Please install linkml-runtime to run this example: pip install linkml-runtime") from e from hdmf.term_set import TermSet, TermSetWrapper try: dir_path = os.path.dirname(os.path.abspath(__file__)) yaml_file = os.path.join(dir_path, 'example_term_set.yaml') schemasheets_folder = os.path.join(dir_path, 'schemasheets') dynamic_schema_path = os.path.join(dir_path, 'example_dynamic_term_set.yaml') except NameError: dir_path = os.path.dirname(os.path.abspath('.')) yaml_file = os.path.join(dir_path, 'gallery/example_term_set.yaml') schemasheets_folder = os.path.join(dir_path, 'gallery/schemasheets') dynamic_schema_path = os.path.join(dir_path, 'gallery/example_dynamic_term_set.yaml') # Use Schemasheets to create TermSet schema # ----------------------------------------- # The :py:class:`~hdmf.term_set.TermSet` class builds off of LinkML Schemasheets, allowing users to convert between # a Google spreadsheet to a complete LinkML schema. Once the user has defined the necessary LinkML metadata within the # spreadsheet, the spreadsheet needs to be saved as individual tsv files, i.e., one tsv file per spreadsheet tab. Please # refer to the Schemasheets tutorial link above for more details on the required syntax structure within the sheets. # Once the tsv files are in a folder, the user simply provides the path to the folder with ``schemasheets_folder``. termset = TermSet(schemasheets_folder=schemasheets_folder) # Use Dynamic Enumerations to populate TermSet # -------------------------------------------- # The :py:class:`~hdmf.term_set.TermSet` class allows user to skip manually defining permissible values, by pulling from # a LinkML supported source. These sources contain multiple ontologies. A user can select a node from an ontology, # in which all the elements on the branch, starting from the chosen node, will be used as permissible values. # Please refer to the LinkMl Dynamic Enumeration tutorial for more information on these sources and how to setup Dynamic # Enumerations within the schema. Once the schema is ready, the user provides a path to the schema and set # ``dynamic=True``. A new schema, with the populated permissible values, will be created in the same directory. termset = TermSet(term_schema_path=dynamic_schema_path, dynamic=True) .. rst-class:: sphx-glr-script-out .. code-block:: none Downloading cl.db.gz: 0.00B [00:00, ?B/s] Downloading cl.db.gz: 0%| | 8.00k/97.4M [00:00<36:45, 46.3kB/s] Downloading cl.db.gz: 6%|▋ | 6.30M/97.4M [00:00<00:04, 22.1MB/s] Downloading cl.db.gz: 8%|▊ | 7.99M/97.4M [00:00<00:05, 16.7MB/s] Downloading cl.db.gz: 15%|█▍ | 14.3M/97.4M [00:00<00:02, 30.0MB/s] Downloading cl.db.gz: 16%|█▋ | 16.0M/97.4M [00:00<00:03, 26.1MB/s] Downloading cl.db.gz: 25%|██▍ | 24.0M/97.4M [00:00<00:02, 30.9MB/s] Downloading cl.db.gz: 31%|███ | 30.3M/97.4M [00:01<00:01, 37.2MB/s] Downloading cl.db.gz: 33%|███▎ | 32.4M/97.4M [00:01<00:02, 33.7MB/s] Downloading cl.db.gz: 39%|███▉ | 38.3M/97.4M [00:01<00:01, 37.7MB/s] Downloading cl.db.gz: 42%|████▏ | 40.6M/97.4M [00:01<00:01, 34.1MB/s] Downloading cl.db.gz: 48%|████▊ | 46.8M/97.4M [00:01<00:01, 33.4MB/s] Downloading cl.db.gz: 49%|████▉ | 48.0M/97.4M [00:01<00:01, 27.7MB/s] Downloading cl.db.gz: 56%|█████▌ | 54.7M/97.4M [00:01<00:01, 28.6MB/s] Downloading cl.db.gz: 58%|█████▊ | 56.0M/97.4M [00:02<00:02, 20.4MB/s] Downloading cl.db.gz: 64%|██████▍ | 62.3M/97.4M [00:02<00:01, 28.3MB/s] Downloading cl.db.gz: 67%|██████▋ | 65.2M/97.4M [00:02<00:01, 28.8MB/s] Downloading cl.db.gz: 72%|███████▏ | 70.3M/97.4M [00:02<00:00, 33.2MB/s] Downloading cl.db.gz: 74%|███████▍ | 72.0M/97.4M [00:02<00:00, 29.3MB/s] Downloading cl.db.gz: 82%|████████▏ | 80.0M/97.4M [00:02<00:00, 38.4MB/s] Downloading cl.db.gz: 89%|████████▊ | 86.3M/97.4M [00:02<00:00, 40.9MB/s] Downloading cl.db.gz: 90%|█████████ | 88.0M/97.4M [00:03<00:00, 31.7MB/s] Downloading cl.db.gz: 97%|█████████▋| 94.7M/97.4M [00:03<00:00, 38.0MB/s] .. GENERATED FROM PYTHON SOURCE LINES 111-117 Viewing TermSet values ---------------------------------------------------- :py:class:`~hdmf.term_set.TermSet` has methods to retrieve terms. The :py:func:`~hdmf.term_set.TermSet.view_set` method will return a dictionary of all the terms and the corresponding information for each term. Users can index specific terms from the :py:class:`~hdmf.term_set.TermSet`. LinkML runtime will need to be installed. You can do so by first running ``pip install linkml-runtime``. .. GENERATED FROM PYTHON SOURCE LINES 117-123 .. code-block:: Python terms = TermSet(term_schema_path=yaml_file) print(terms.view_set) # Retrieve a specific term terms['Homo sapiens'] .. rst-class:: sphx-glr-script-out .. code-block:: none {'Homo sapiens': Term_Info(id='NCBI_TAXON:9606', description='the species is human', meaning='https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=9606'), 'Mus musculus': Term_Info(id='NCBI_TAXON:10090', description='the species is a house mouse', meaning='https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=10090'), 'Ursus arctos horribilis': Term_Info(id='NCBI_TAXON:116960', description='the species is a grizzly bear', meaning='https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=116960'), 'Myrmecophaga tridactyla': Term_Info(id='NCBI_TAXON:71006', description='the species is an anteater', meaning='https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=71006')} Term_Info(id='NCBI_TAXON:9606', description='the species is human', meaning='https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=9606') .. GENERATED FROM PYTHON SOURCE LINES 124-129 Validate Data with TermSetWrapper ---------------------------------------------------- :py:class:`~hdmf.term_set.TermSetWrapper` can be wrapped around data. To validate data, the user will set the data to the wrapped data, in which validation must pass for the data object to be created. .. GENERATED FROM PYTHON SOURCE LINES 129-135 .. code-block:: Python data = VectorData( name='species', description='...', data=TermSetWrapper(value=['Homo sapiens'], termset=terms) ) .. GENERATED FROM PYTHON SOURCE LINES 136-141 Validate Compound Data with TermSetWrapper ---------------------------------------------------- :py:class:`~hdmf.term_set.TermSetWrapper` can be wrapped around compound data. The user will set the field within the compound data type that is to be validated with the termset. .. GENERATED FROM PYTHON SOURCE LINES 141-148 .. code-block:: Python c_data = np.array([('Homo sapiens', 24)], dtype=[('species', 'U50'), ('age', 'i4')]) data = VectorData( name='species', description='...', data=TermSetWrapper(value=c_data, termset=terms, field='species') ) .. GENERATED FROM PYTHON SOURCE LINES 149-154 Validate Attributes with TermSetWrapper ---------------------------------------------------- Similar to wrapping datasets, :py:class:`~hdmf.term_set.TermSetWrapper` can be wrapped around any attribute. To validate attributes, the user will set the attribute to the wrapped value, in which validation must pass for the object to be created. .. GENERATED FROM PYTHON SOURCE LINES 154-160 .. code-block:: Python data = VectorData( name='species', description=TermSetWrapper(value='Homo sapiens', termset=terms), data=['Human'] ) .. GENERATED FROM PYTHON SOURCE LINES 161-165 Validate on append with TermSetWrapper ---------------------------------------------------- As mentioned prior, when using a :py:class:`~hdmf.term_set.TermSetWrapper`, all new data is validated. This is true for adding new data with append and extend. .. GENERATED FROM PYTHON SOURCE LINES 165-174 .. code-block:: Python data = VectorData( name='species', description='...', data=TermSetWrapper(value=['Homo sapiens'], termset=terms) ) data.append('Ursus arctos horribilis') data.extend(['Mus musculus', 'Myrmecophaga tridactyla']) .. GENERATED FROM PYTHON SOURCE LINES 175-180 Validate Data in a DynamicTable ---------------------------------------------------- Validating data for :py:class:`~hdmf.common.table.DynamicTable` is determined by which columns were initialized with a :py:class:`~hdmf.term_set.TermSetWrapper`. The data is validated when the columns are created and modified using ``DynamicTable.add_row``. .. GENERATED FROM PYTHON SOURCE LINES 180-192 .. code-block:: Python col1 = VectorData( name='Species_1', description='...', data=TermSetWrapper(value=['Homo sapiens'], termset=terms), ) col2 = VectorData( name='Species_2', description='...', data=TermSetWrapper(value=['Mus musculus'], termset=terms), ) species = DynamicTable(name='species', description='My species', columns=[col1,col2]) .. GENERATED FROM PYTHON SOURCE LINES 193-201 Validate new rows in a DynamicTable with TermSetWrapper -------------------------------------------------------- Validating new rows to :py:class:`~hdmf.common.table.DynamicTable` is simple. The :py:func:`~hdmf.common.table.DynamicTable.add_row` method will automatically check each column for a :py:class:`~hdmf.term_set.TermSetWrapper`. If a wrapper is being used, then the data will be validated for that column using that column's :py:class:`~hdmf.term_set.TermSet` from the :py:class:`~hdmf.term_set.TermSetWrapper`. If there is invalid data, the row will not be added and the user will be prompted to fix the new data in order to populate the table. .. GENERATED FROM PYTHON SOURCE LINES 201-203 .. code-block:: Python species.add_row(Species_1='Mus musculus', Species_2='Mus musculus') .. GENERATED FROM PYTHON SOURCE LINES 204-209 Validate new columns in a DynamicTable with TermSetWrapper ----------------------------------------------------------- To add a column that is validated using :py:class:`~hdmf.term_set.TermSetWrapper`, wrap the data in the :py:func:`~hdmf.common.table.DynamicTable.add_column` method as if you were making a new instance of :py:class:`~hdmf.common.table.VectorData`. .. GENERATED FROM PYTHON SOURCE LINES 209-212 .. code-block:: Python species.add_column(name='Species_3', description='...', data=TermSetWrapper(value=['Ursus arctos horribilis', 'Mus musculus'], termset=terms),) .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 7.442 seconds) .. _sphx_glr_download_tutorials_plot_term_set.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_term_set.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_term_set.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_term_set.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_