AlignedDynamicTable

This is a user guide to interacting with AlignedDynamicTable objects.

Introduction

The class AlignedDynamicTable represents a column-based table with support for grouping columns by category. AlignedDynamicTable inherits from DynamicTable and may contain additional DynamicTable objects, one per sub-category. All tables must align, i.e., they are required to have the same number of rows. Some key features of AlignedDynamicTable are:

When to use (and not use) AlignedDynamicTable?

AlignedDynamicTable is a useful data structure but it is also fairly complex, consisting of multiple DynamicTable objects, each of which is itself a complex type composed of many datasets and attributes. In general, if a simpler data structure is sufficient, then consider using those instead. For example, consider using instead:

  • DynamicTable if a regular table is sufficient.

  • A compound dataset via Table if all columns of a table are fixed and fast, column-based access is not critical but fast row-based access is.

  • Multiple, separate tables if using AlignedDynamicTable would lead to duplication of data (i.e., de-normalize data), e.g., by having to replicate values across rows of the table.

Use AlignedDynamicTable when:

  • When you need to group columns in a DynamicTable by category

  • Need to avoid name collisions between columns in a DynamicTable and creating compound columns is not an option

Constructing a table

To create an AlignedDynamicTable, call the constructor with:

  • name string with the name of the table, and

  • description string to describe the table.

from hdmf.common import AlignedDynamicTable

customer_table = AlignedDynamicTable(
    name='customers',
    description='an example aligned table',
)

Initializing columns of the primary table

The basic behavior of adding data and initalizing AlignedDynamicTable is the same as in DynamicTable. See the DynamicTable How-To Guide for details. E.g., using the columns and colnames parameters (which are inherited from DynamicTable) we can define the columns of the primary table. All columns must have the same length.

from hdmf.common import VectorData

col1 = VectorData(
    name='firstname',
    description='Customer first name',
    data=['Peter', 'Emma']
)
col2 = VectorData(
    name='lastname',
    description='Customer last name',
    data=['Williams', 'Brown']
)

customer_table = AlignedDynamicTable(
    name='customer',
    description='an example aligned table',
    columns=[col1, col2]
)

Initializing categories

By specifying the category_tables as a list of DynamicTable objects we can then directly specify the sub-category tables. Optionally, we can also set the categories names of the sub-tables as an array of strings to define the ordering of categories.

from hdmf.common import DynamicTable

# create the home_address category table
subcol1 = VectorData(
    name='city',
    description='city',
    data=['Rivercity', 'Mountaincity']
)
subcol2 = VectorData(
    name='street',
    description='street data',
    data=['Amazonstreet', 'Alpinestreet']
)
homeaddress_table = DynamicTable(
    name='home_address',
    description='home address of the customer',
    columns=[subcol1, subcol2]
)

# create the table
customer_table = AlignedDynamicTable(
    name='customer',
    description='an example aligned table',
    columns=[col1, col2],
    category_tables=[homeaddress_table, ]
)

# render the table in the online docs
customer_table.to_dataframe()
customer home_address
firstname lastname id city street
(customer, id)
0 Peter Williams 0 Rivercity Amazonstreet
1 Emma Brown 1 Mountaincity Alpinestreet


Adding more data to the table

We can add rows, columns, and new categories to the table.

Adding a row

To add a row via add_row we can either: 1) provide the row data as a single dict to the data parameter or 2) specify a dict for each category and column as keyword arguments. Additional optional arguments include id and enforce_unique_id.

customer_table.add_row(
    firstname='Paul',
    lastname='Smith',
    home_address={'city': 'Bugcity',
                  'street': 'Beestree'}
)

# render the table in the online docs
customer_table.to_dataframe()
customer home_address
firstname lastname id city street
(customer, id)
0 Peter Williams 0 Rivercity Amazonstreet
1 Emma Brown 1 Mountaincity Alpinestreet
2 Paul Smith 2 Bugcity Beestree


Adding a column

To add a columns we use add_column.

customer_table.add_column(
    name='zipcode',
    description='zip code of the city',
    data=[11111, 22222, 33333],  # specify data for the 3 rows in the table
    category='home_address'  # use None (or omit) to add columns to the primary table
)

# render the table in the online docs
customer_table.to_dataframe()
customer home_address
firstname lastname id city street zipcode
(customer, id)
0 Peter Williams 0 Rivercity Amazonstreet 11111
1 Emma Brown 1 Mountaincity Alpinestreet 22222
2 Paul Smith 2 Bugcity Beestree 33333


Adding a category

To add a new DynamicTable as a category, we use add_category.

Note

Only regular DynamicTables are allowed as category tables. Using an AlignedDynamicTable as a category for another AlignedDynamicTable is currently not supported.

# create a new category DynamicTable for the work address
subcol1 = VectorData(
    name='city',
    description='city',
    data=['Busycity', 'Worktown', 'Labortown']
)
subcol2 = VectorData(
    name='street',
    description='street data',
    data=['Cannery Row', 'Woodwork Avenue', 'Steel Street']
)
subcol3 = VectorData(
    name='zipcode',
    description='zip code of the city',
    data=[33333, 44444, 55555])
workaddress_table = DynamicTable(
    name='work_address',
    description='home address of the customer',
    columns=[subcol1, subcol2, subcol3]
)

# add the category to our AlignedDynamicTable
customer_table.add_category(category=workaddress_table)

# render the table in the online docs
customer_table.to_dataframe()
customer home_address work_address
firstname lastname id city street zipcode id city street zipcode
(customer, id)
0 Peter Williams 0 Rivercity Amazonstreet 11111 0 Busycity Cannery Row 33333
1 Emma Brown 1 Mountaincity Alpinestreet 22222 1 Worktown Woodwork Avenue 44444
2 Paul Smith 2 Bugcity Beestree 33333 2 Labortown Steel Street 55555


Note

Because each category is stored as a separate DynamicTable there are no name collisions between the columns of the home_address and work_address tables, so that both can contain matching city, street, and zipcode columns. However, since a category table is a sub-part of the primary table, categories must not have the same name as other columns or other categories in the primary table.

Accessing categories, columns, rows, and cells

Convert to a pandas DataFrame

If we need to access the whole table for analysis, then converting the table to pandas DataFrame is a convenient option. To ignore the id columns of all category tables we can simply set the ignore_category_ids parameter.

# render the table in the online docs while ignoring the id column of category tables
customer_table.to_dataframe(ignore_category_ids=True)
customer home_address work_address
firstname lastname city street zipcode city street zipcode
(customer, id)
0 Peter Williams Rivercity Amazonstreet 11111 Busycity Cannery Row 33333
1 Emma Brown Mountaincity Alpinestreet 22222 Worktown Woodwork Avenue 44444
2 Paul Smith Bugcity Beestree 33333 Labortown Steel Street 55555


Accessing categories

# Get the list of all categories
_ = customer_table.categories

# Get the DynamicTable object of a particular category
_ = customer_table.get_category(name='home_address')

# Alternatively, we can use normal array slicing to get the category as a pandas DataFrame.
# NOTE: In contrast to the previous call, the table is here converted to a DataFrame.
_ = customer_table['home_address']

Accessing columns

We can use the standard Python in operator to check if a column exists

# To check if a column exists in the primary table we only need to specify the column name
# or alternatively specify the category as None
_ = 'firstname' in customer_table
_ = (None, 'firstname') in customer_table
# To check if a column exists in a category table we need to specify the category
# and column name as a tuple
_ = ('home_address', 'zipcode') in customer_table

We can use standard array slicing to get the VectorData object of a column.

# To get a column from the primary table we just provide the name.
_ = customer_table['firstname']
# To get a column from a category table we provide both the category name and column name
_ = customer_table['home_address', 'city']

Accessing rows

Accessing rows works much like in DynamicTable How-To Guide

# Get a single row by index as a DataFrame
customer_table[1]
customer home_address work_address
firstname lastname id city street zipcode id city street zipcode
(customer, id)
1 Emma Brown 1 Mountaincity Alpinestreet 22222 1 Worktown Woodwork Avenue 44444


# Get a range of rows as a DataFrame
customer_table[0:2]
customer home_address work_address
firstname lastname id city street zipcode id city street zipcode
(customer, id)
0 Peter Williams 0 Rivercity Amazonstreet 11111 0 Busycity Cannery Row 33333
1 Emma Brown 1 Mountaincity Alpinestreet 22222 1 Worktown Woodwork Avenue 44444


# Get a list of rows as a DataFrame
customer_table[[0, 2]]
customer home_address work_address
firstname lastname id city street zipcode id city street zipcode
(customer, id)
0 Peter Williams 0 Rivercity Amazonstreet 11111 0 Busycity Cannery Row 33333
2 Paul Smith 2 Bugcity Beestree 33333 2 Labortown Steel Street 55555


Accessing cells

To get a set of cells we need to specify the: 1) category, 2) column, and 3) row index when slicing into the table.

When selecting from the primary table we need to specify None for the category, followed by the column name and the selection.

# Select rows 0:2 from the 'firstname' column in the primary table
customer_table[None, 'firstname', 0:2]
['Peter', 'Emma']
# Select rows 1 from the 'firstname' column in the primary table
customer_table[None, 'firstname', 1]
'Emma'
# Select rows 0 and 2 from the 'firstname' column in the primary table
customer_table[None, 'firstname', [0, 2]]
['Peter', 'Paul']
# Select rows 0:2 from the 'city' column of the 'home_address' category table
customer_table['home_address', 'city', 0:2]
['Rivercity', 'Mountaincity']

Gallery generated by Sphinx-Gallery