HDF5 Files (pytables)

The package pytables is required. Instructions for installation can be found at http://pytables.github.com/usersguide/installation.html or try apt-get install python-tables.

petlx.hdf5.fromhdf5(source, where=None, name=None, condition=None, condvars=None, start=None, stop=None, step=None)[source]

Provides access to an HDF5 table. E.g.:

>>> from petl import look
>>> from petlx.hdf5 import fromhdf5
>>> table1 = fromhdf5('test1.h5', '/testgroup', 'testtable')
>>> look(table1)
+-------+----------+
| 'foo' | 'bar'    |
+=======+==========+
| 1     | 'asdfgh' |
+-------+----------+
| 2     | 'qwerty' |
+-------+----------+
| 3     | 'zxcvbn' |
+-------+----------+

Some alternative signatures:

>>> # just specify path to table node
... table1 = fromhdf5('test1.h5', '/testgroup/testtable')
>>> 
>>> # use an existing tables.File object
... import tables
>>> h5file = tables.openFile('test1.h5')
>>> table1 = fromhdf5(h5file, '/testgroup/testtable')
>>> 
>>> # use an existing tables.Table object
... h5tbl = h5file.getNode('/testgroup/testtable')
>>> table1 = fromhdf5(h5tbl)
>>> 
>>> # use a condition to filter data
... table2 = fromhdf5(h5tbl, condition="(foo < 3)")
>>> look(table2)
+-------+----------+
| 'foo' | 'bar'    |
+=======+==========+
| 1     | 'asdfgh' |
+-------+----------+
| 2     | 'qwerty' |
+-------+----------+

New in version 0.3.

petlx.hdf5.fromhdf5sorted(source, where=None, name=None, sortby=None, checkCSI=False, start=None, stop=None, step=None)[source]

Provides access to an HDF5 table, sorted by an indexed column, e.g.:

>>> # set up a new hdf5 table to demonstrate with
... import tables
>>> h5file = tables.openFile("test1.h5", mode="w", title="Test file")
>>> h5file.createGroup('/', 'testgroup', 'Test Group')
/testgroup (Group) 'Test Group'
  children := []
>>> class FooBar(tables.IsDescription):
...     foo = tables.Int32Col(pos=0)
...     bar = tables.StringCol(6, pos=2)
... 
>>> h5table = h5file.createTable('/testgroup', 'testtable', FooBar, 'Test Table')
>>> 
>>> # load some data into the table
... table1 = (('foo', 'bar'),
...           (3, 'asdfgh'),
...           (2, 'qwerty'),
...           (1, 'zxcvbn'))
>>> 
>>> for row in table1[1:]:
...     for i, f in enumerate(table1[0]):
...         h5table.row[f] = row[i]
...     h5table.row.append()
... 
>>> h5table.cols.foo.createCSIndex() # CS index is required
0
>>> h5file.flush()
>>> h5file.close()
>>> 
>>> # access the data, sorted by the indexed column
... from petl import look
>>> from petlx.hdf5 import fromhdf5sorted
>>> table2 = fromhdf5sorted('test1.h5', '/testgroup', 'testtable', sortby='foo')
>>> look(table2)
+-------+----------+
| 'foo' | 'bar'    |
+=======+==========+
| 1     | 'zxcvbn' |
+-------+----------+
| 2     | 'qwerty' |
+-------+----------+
| 3     | 'asdfgh' |
+-------+----------+

New in version 0.3.

petlx.hdf5.tohdf5(table, source, where=None, name=None, create=False, description=None, title='', filters=None, expectedrows=10000, chunkshape=None, byteorder=None, createparents=False, sample=1000)[source]

Write to an HDF5 table. If create is False, assumes the table already exists, and attempts to truncate it before loading. If create is True, any existing table is dropped, and a new table is created; if description is None, the datatype will be guessed. E.g.:

>>> from petl import look
>>> look(table1)
+-------+----------+
| 'foo' | 'bar'    |
+=======+==========+
| 1     | 'asdfgh' |
+-------+----------+
| 2     | 'qwerty' |
+-------+----------+
| 3     | 'zxcvbn' |
+-------+----------+

>>> from petlx.hdf5 import tohdf5, fromhdf5
>>> tohdf5(table1, 'test1.h5', '/testgroup', 'testtable', create=True, createparents=True)
>>> look(fromhdf5('test1.h5', '/testgroup', 'testtable'))
+-------+----------+
| 'foo' | 'bar'    |
+=======+==========+
| 1     | 'asdfgh' |
+-------+----------+
| 2     | 'qwerty' |
+-------+----------+
| 3     | 'zxcvbn' |
+-------+----------+

See also appendhdf5().

New in version 0.3.

petlx.hdf5.appendhdf5(table, source, where=None, name=None)[source]

Like tohdf5() but don’t truncate the table before loading.

New in version 0.3.

Project Versions

Previous topic

GFF3 Utilities

Next topic

Tabix (pysam)

This Page