Variant Call Format Utilities (PyVCF)

petlx.vcf.fromvcf(filename, chrom=None, start=None, end=None, samples=True)[source]

Returns a table providing access to data from a variant call file (VCF). E.g.:

>>> from petl import look
>>> from petlx.vcf import fromvcf
>>> t = fromvcf('example.vcf')
>>> look(t)
+---------+---------+-------------+-------+-----------+--------+----------+-----------------------------------------------------------------------------------------+---------------------------------------------------------------------+------------------------------------------------------------------+----------------------------------------------------------------------+
| 'CHROM' | 'POS'   | 'ID'        | 'REF' | 'ALT'     | 'QUAL' | 'FILTER' | 'INFO'                                                                                  | 'NA00001'                                                           | 'NA00002'                                                        | 'NA00003'                                                            |
+=========+=========+=============+=======+===========+========+==========+=========================================================================================+=====================================================================+==================================================================+======================================================================+
| '19'    |     111 | None        | 'A'   | [C]       |    9.6 | []       | {}                                                                                      | Call(sample=NA00001, CallData(GT=0|0, HQ=[10, 10]))                 | Call(sample=NA00002, CallData(GT=0|0, HQ=[10, 10]))              | Call(sample=NA00003, CallData(GT=0/1, HQ=[3, 3]))                    |
+---------+---------+-------------+-------+-----------+--------+----------+-----------------------------------------------------------------------------------------+---------------------------------------------------------------------+------------------------------------------------------------------+----------------------------------------------------------------------+
| '19'    |     112 | None        | 'A'   | [G]       |     10 | []       | {}                                                                                      | Call(sample=NA00001, CallData(GT=0|0, HQ=[10, 10]))                 | Call(sample=NA00002, CallData(GT=0|0, HQ=[10, 10]))              | Call(sample=NA00003, CallData(GT=0/1, HQ=[3, 3]))                    |
+---------+---------+-------------+-------+-----------+--------+----------+-----------------------------------------------------------------------------------------+---------------------------------------------------------------------+------------------------------------------------------------------+----------------------------------------------------------------------+
| '20'    |   14370 | 'rs6054257' | 'G'   | [A]       |     29 | []       | OrderedDict([('NS', 3), ('DP', 14), ('AF', [0.5]), ('DB', True), ('H2', True)])         | Call(sample=NA00001, CallData(GT=0|0, GQ=48, DP=1, HQ=[51, 51]))    | Call(sample=NA00002, CallData(GT=1|0, GQ=48, DP=8, HQ=[51, 51])) | Call(sample=NA00003, CallData(GT=1/1, GQ=43, DP=5, HQ=[None, None])) |
+---------+---------+-------------+-------+-----------+--------+----------+-----------------------------------------------------------------------------------------+---------------------------------------------------------------------+------------------------------------------------------------------+----------------------------------------------------------------------+
| '20'    |   17330 | None        | 'T'   | [A]       |      3 | ['q10']  | OrderedDict([('NS', 3), ('DP', 11), ('AF', [0.017])])                                   | Call(sample=NA00001, CallData(GT=0|0, GQ=49, DP=3, HQ=[58, 50]))    | Call(sample=NA00002, CallData(GT=0|1, GQ=3, DP=5, HQ=[65, 3]))   | Call(sample=NA00003, CallData(GT=0/0, GQ=41, DP=3, HQ=[None, None])) |
+---------+---------+-------------+-------+-----------+--------+----------+-----------------------------------------------------------------------------------------+---------------------------------------------------------------------+------------------------------------------------------------------+----------------------------------------------------------------------+
| '20'    | 1110696 | 'rs6040355' | 'A'   | [G, T]    |     67 | []       | OrderedDict([('NS', 2), ('DP', 10), ('AF', [0.333, 0.667]), ('AA', 'T'), ('DB', True)]) | Call(sample=NA00001, CallData(GT=1|2, GQ=21, DP=6, HQ=[23, 27]))    | Call(sample=NA00002, CallData(GT=2|1, GQ=2, DP=0, HQ=[18, 2]))   | Call(sample=NA00003, CallData(GT=2/2, GQ=35, DP=4, HQ=[None, None])) |
+---------+---------+-------------+-------+-----------+--------+----------+-----------------------------------------------------------------------------------------+---------------------------------------------------------------------+------------------------------------------------------------------+----------------------------------------------------------------------+
| '20'    | 1230237 | None        | 'T'   | [None]    |     47 | []       | OrderedDict([('NS', 3), ('DP', 13), ('AA', 'T')])                                       | Call(sample=NA00001, CallData(GT=0|0, GQ=54, DP=None, HQ=[56, 60])) | Call(sample=NA00002, CallData(GT=0|0, GQ=48, DP=4, HQ=[51, 51])) | Call(sample=NA00003, CallData(GT=0/0, GQ=61, DP=2, HQ=[None, None])) |
+---------+---------+-------------+-------+-----------+--------+----------+-----------------------------------------------------------------------------------------+---------------------------------------------------------------------+------------------------------------------------------------------+----------------------------------------------------------------------+
| '20'    | 1234567 | 'microsat1' | 'G'   | [GA, GAC] |     50 | []       | OrderedDict([('NS', 3), ('DP', 9), ('AA', 'G'), ('AN', 6), ('AC', [3, 1])])             | Call(sample=NA00001, CallData(GT=0/1, GQ=None, DP=4))               | Call(sample=NA00002, CallData(GT=0/2, GQ=17, DP=2))              | Call(sample=NA00003, CallData(GT=None, GQ=40, DP=3))                 |
+---------+---------+-------------+-------+-----------+--------+----------+-----------------------------------------------------------------------------------------+---------------------------------------------------------------------+------------------------------------------------------------------+----------------------------------------------------------------------+
| '20'    | 1235237 | None        | 'T'   | [None]    | None   | []       | {}                                                                                      | Call(sample=NA00001, CallData(GT=0/0))                              | Call(sample=NA00002, CallData(GT=0|0))                           | Call(sample=NA00003, CallData(GT=None))                              |
+---------+---------+-------------+-------+-----------+--------+----------+-----------------------------------------------------------------------------------------+---------------------------------------------------------------------+------------------------------------------------------------------+----------------------------------------------------------------------+
| 'X'     |      10 | 'rsTest'    | 'AC'  | [A, ATG]  |     10 | []       | {}                                                                                      | Call(sample=NA00001, CallData(GT=0))                                | Call(sample=NA00002, CallData(GT=0/1))                           | Call(sample=NA00003, CallData(GT=0|2))                               |
+---------+---------+-------------+-------+-----------+--------+----------+-----------------------------------------------------------------------------------------+---------------------------------------------------------------------+------------------------------------------------------------------+----------------------------------------------------------------------+

New in version 0.5.

petlx.vcf.unpackinfo(tbl, *keys, **kwargs)[source]

Unpack the INFO field into separate fields. E.g.:

>>> from petlx.vcf import fromvcf, unpackinfo
>>> from petl import look
>>> t1 = fromvcf('../fixture/sample.vcf', samples=False)
>>> look(t1)
+---------+---------+-------------+-------+-----------+--------+----------+-----------------------------------------------------------------------------------------+
| 'CHROM' | 'POS'   | 'ID'        | 'REF' | 'ALT'     | 'QUAL' | 'FILTER' | 'INFO'                                                                                  |
+=========+=========+=============+=======+===========+========+==========+=========================================================================================+
| '19'    |     111 | None        | 'A'   | [C]       |    9.6 | []       | {}                                                                                      |
+---------+---------+-------------+-------+-----------+--------+----------+-----------------------------------------------------------------------------------------+
| '19'    |     112 | None        | 'A'   | [G]       |     10 | []       | {}                                                                                      |
+---------+---------+-------------+-------+-----------+--------+----------+-----------------------------------------------------------------------------------------+
| '20'    |   14370 | 'rs6054257' | 'G'   | [A]       |     29 | []       | OrderedDict([('NS', 3), ('DP', 14), ('AF', [0.5]), ('DB', True), ('H2', True)])         |
+---------+---------+-------------+-------+-----------+--------+----------+-----------------------------------------------------------------------------------------+
| '20'    |   17330 | None        | 'T'   | [A]       |      3 | ['q10']  | OrderedDict([('NS', 3), ('DP', 11), ('AF', [0.017])])                                   |
+---------+---------+-------------+-------+-----------+--------+----------+-----------------------------------------------------------------------------------------+
| '20'    | 1110696 | 'rs6040355' | 'A'   | [G, T]    |     67 | []       | OrderedDict([('NS', 2), ('DP', 10), ('AF', [0.333, 0.667]), ('AA', 'T'), ('DB', True)]) |
+---------+---------+-------------+-------+-----------+--------+----------+-----------------------------------------------------------------------------------------+
| '20'    | 1230237 | None        | 'T'   | [None]    |     47 | []       | OrderedDict([('NS', 3), ('DP', 13), ('AA', 'T')])                                       |
+---------+---------+-------------+-------+-----------+--------+----------+-----------------------------------------------------------------------------------------+
| '20'    | 1234567 | 'microsat1' | 'G'   | [GA, GAC] |     50 | []       | OrderedDict([('NS', 3), ('DP', 9), ('AA', 'G'), ('AN', 6), ('AC', [3, 1])])             |
+---------+---------+-------------+-------+-----------+--------+----------+-----------------------------------------------------------------------------------------+
| '20'    | 1235237 | None        | 'T'   | [None]    | None   | []       | {}                                                                                      |
+---------+---------+-------------+-------+-----------+--------+----------+-----------------------------------------------------------------------------------------+
| 'X'     |      10 | 'rsTest'    | 'AC'  | [A, ATG]  |     10 | []       | {}                                                                                      |
+---------+---------+-------------+-------+-----------+--------+----------+-----------------------------------------------------------------------------------------+

>>> t2 = unpackinfo(t1)
>>> look(t2)
+---------+---------+-------------+-------+-----------+--------+----------+------+------+--------+------+----------------+------+------+------+
| 'CHROM' | 'POS'   | 'ID'        | 'REF' | 'ALT'     | 'QUAL' | 'FILTER' | 'NS' | 'AN' | 'AC'   | 'DP' | 'AF'           | 'AA' | 'DB' | 'H2' |
+=========+=========+=============+=======+===========+========+==========+======+======+========+======+================+======+======+======+
| '19'    |     111 | None        | 'A'   | [C]       |    9.6 | []       | None | None | None   | None | None           | None | None | None |
+---------+---------+-------------+-------+-----------+--------+----------+------+------+--------+------+----------------+------+------+------+
| '19'    |     112 | None        | 'A'   | [G]       |     10 | []       | None | None | None   | None | None           | None | None | None |
+---------+---------+-------------+-------+-----------+--------+----------+------+------+--------+------+----------------+------+------+------+
| '20'    |   14370 | 'rs6054257' | 'G'   | [A]       |     29 | []       |    3 | None | None   |   14 | [0.5]          | None | True | True |
+---------+---------+-------------+-------+-----------+--------+----------+------+------+--------+------+----------------+------+------+------+
| '20'    |   17330 | None        | 'T'   | [A]       |      3 | ['q10']  |    3 | None | None   |   11 | [0.017]        | None | None | None |
+---------+---------+-------------+-------+-----------+--------+----------+------+------+--------+------+----------------+------+------+------+
| '20'    | 1110696 | 'rs6040355' | 'A'   | [G, T]    |     67 | []       |    2 | None | None   |   10 | [0.333, 0.667] | 'T'  | True | None |
+---------+---------+-------------+-------+-----------+--------+----------+------+------+--------+------+----------------+------+------+------+
| '20'    | 1230237 | None        | 'T'   | [None]    |     47 | []       |    3 | None | None   |   13 | None           | 'T'  | None | None |
+---------+---------+-------------+-------+-----------+--------+----------+------+------+--------+------+----------------+------+------+------+
| '20'    | 1234567 | 'microsat1' | 'G'   | [GA, GAC] |     50 | []       |    3 |    6 | [3, 1] |    9 | None           | 'G'  | None | None |
+---------+---------+-------------+-------+-----------+--------+----------+------+------+--------+------+----------------+------+------+------+
| '20'    | 1235237 | None        | 'T'   | [None]    | None   | []       | None | None | None   | None | None           | None | None | None |
+---------+---------+-------------+-------+-----------+--------+----------+------+------+--------+------+----------------+------+------+------+
| 'X'     |      10 | 'rsTest'    | 'AC'  | [A, ATG]  |     10 | []       | None | None | None   | None | None           | None | None | None |
+---------+---------+-------------+-------+-----------+--------+----------+------+------+--------+------+----------------+------+------+------+

New in version 0.5.

petlx.vcf.meltsamples(tbl, *samples)[source]

Melt the samples columns. E.g.:

>>> from petlx.vcf import fromvcf, unpackinfo, meltsamples
>>> from petl import look, cutout
>>> t1 = fromvcf('../fixture/sample.vcf')
>>> t2 = meltsamples(t1)
>>> t3 = cutout(t2, 'INFO')
>>> look(t3)
+---------+-------+-------------+-------+-------+--------+----------+-----------+----------------------------------------------------------------------+
| 'CHROM' | 'POS' | 'ID'        | 'REF' | 'ALT' | 'QUAL' | 'FILTER' | 'SAMPLE'  | 'CALL'                                                               |
+=========+=======+=============+=======+=======+========+==========+===========+======================================================================+
| '19'    |   111 | None        | 'A'   | [C]   |    9.6 | []       | 'NA00001' | Call(sample=NA00001, CallData(GT=0|0, HQ=[10, 10]))                  |
+---------+-------+-------------+-------+-------+--------+----------+-----------+----------------------------------------------------------------------+
| '19'    |   111 | None        | 'A'   | [C]   |    9.6 | []       | 'NA00002' | Call(sample=NA00002, CallData(GT=0|0, HQ=[10, 10]))                  |
+---------+-------+-------------+-------+-------+--------+----------+-----------+----------------------------------------------------------------------+
| '19'    |   111 | None        | 'A'   | [C]   |    9.6 | []       | 'NA00003' | Call(sample=NA00003, CallData(GT=0/1, HQ=[3, 3]))                    |
+---------+-------+-------------+-------+-------+--------+----------+-----------+----------------------------------------------------------------------+
| '19'    |   112 | None        | 'A'   | [G]   |     10 | []       | 'NA00001' | Call(sample=NA00001, CallData(GT=0|0, HQ=[10, 10]))                  |
+---------+-------+-------------+-------+-------+--------+----------+-----------+----------------------------------------------------------------------+
| '19'    |   112 | None        | 'A'   | [G]   |     10 | []       | 'NA00002' | Call(sample=NA00002, CallData(GT=0|0, HQ=[10, 10]))                  |
+---------+-------+-------------+-------+-------+--------+----------+-----------+----------------------------------------------------------------------+
| '19'    |   112 | None        | 'A'   | [G]   |     10 | []       | 'NA00003' | Call(sample=NA00003, CallData(GT=0/1, HQ=[3, 3]))                    |
+---------+-------+-------------+-------+-------+--------+----------+-----------+----------------------------------------------------------------------+
| '20'    | 14370 | 'rs6054257' | 'G'   | [A]   |     29 | []       | 'NA00001' | Call(sample=NA00001, CallData(GT=0|0, GQ=48, DP=1, HQ=[51, 51]))     |
+---------+-------+-------------+-------+-------+--------+----------+-----------+----------------------------------------------------------------------+
| '20'    | 14370 | 'rs6054257' | 'G'   | [A]   |     29 | []       | 'NA00002' | Call(sample=NA00002, CallData(GT=1|0, GQ=48, DP=8, HQ=[51, 51]))     |
+---------+-------+-------------+-------+-------+--------+----------+-----------+----------------------------------------------------------------------+
| '20'    | 14370 | 'rs6054257' | 'G'   | [A]   |     29 | []       | 'NA00003' | Call(sample=NA00003, CallData(GT=1/1, GQ=43, DP=5, HQ=[None, None])) |
+---------+-------+-------------+-------+-------+--------+----------+-----------+----------------------------------------------------------------------+
| '20'    | 17330 | None        | 'T'   | [A]   |      3 | ['q10']  | 'NA00001' | Call(sample=NA00001, CallData(GT=0|0, GQ=49, DP=3, HQ=[58, 50]))     |
+---------+-------+-------------+-------+-------+--------+----------+-----------+----------------------------------------------------------------------+

New in version 0.5.

petlx.vcf.unpackcall(tbl, *keys, **kwargs)[source]

Unpack the call column. E.g.:

>>> from petlx.vcf import fromvcf, unpackinfo, meltsamples, unpackcall
>>> from petl import look, cutout
>>> t1 = fromvcf('../fixture/sample.vcf')
>>> t2 = meltsamples(t1)
>>> t3 = unpackcall(t2)
>>> t4 = cutout(t3, 'INFO')
>>> look(t4)
+---------+-------+-------------+-------+-------+--------+----------+-----------+-------+------+------+--------------+
| 'CHROM' | 'POS' | 'ID'        | 'REF' | 'ALT' | 'QUAL' | 'FILTER' | 'SAMPLE'  | 'GT'  | 'GQ' | 'DP' | 'HQ'         |
+=========+=======+=============+=======+=======+========+==========+===========+=======+======+======+==============+
| '19'    |   111 | None        | 'A'   | [C]   |    9.6 | []       | 'NA00001' | '0|0' | None | None | [10, 10]     |
+---------+-------+-------------+-------+-------+--------+----------+-----------+-------+------+------+--------------+
| '19'    |   111 | None        | 'A'   | [C]   |    9.6 | []       | 'NA00002' | '0|0' | None | None | [10, 10]     |
+---------+-------+-------------+-------+-------+--------+----------+-----------+-------+------+------+--------------+
| '19'    |   111 | None        | 'A'   | [C]   |    9.6 | []       | 'NA00003' | '0/1' | None | None | [3, 3]       |
+---------+-------+-------------+-------+-------+--------+----------+-----------+-------+------+------+--------------+
| '19'    |   112 | None        | 'A'   | [G]   |     10 | []       | 'NA00001' | '0|0' | None | None | [10, 10]     |
+---------+-------+-------------+-------+-------+--------+----------+-----------+-------+------+------+--------------+
| '19'    |   112 | None        | 'A'   | [G]   |     10 | []       | 'NA00002' | '0|0' | None | None | [10, 10]     |
+---------+-------+-------------+-------+-------+--------+----------+-----------+-------+------+------+--------------+
| '19'    |   112 | None        | 'A'   | [G]   |     10 | []       | 'NA00003' | '0/1' | None | None | [3, 3]       |
+---------+-------+-------------+-------+-------+--------+----------+-----------+-------+------+------+--------------+
| '20'    | 14370 | 'rs6054257' | 'G'   | [A]   |     29 | []       | 'NA00001' | '0|0' |   48 |    1 | [51, 51]     |
+---------+-------+-------------+-------+-------+--------+----------+-----------+-------+------+------+--------------+
| '20'    | 14370 | 'rs6054257' | 'G'   | [A]   |     29 | []       | 'NA00002' | '1|0' |   48 |    8 | [51, 51]     |
+---------+-------+-------------+-------+-------+--------+----------+-----------+-------+------+------+--------------+
| '20'    | 14370 | 'rs6054257' | 'G'   | [A]   |     29 | []       | 'NA00003' | '1/1' |   43 |    5 | [None, None] |
+---------+-------+-------------+-------+-------+--------+----------+-----------+-------+------+------+--------------+
| '20'    | 17330 | None        | 'T'   | [A]   |      3 | ['q10']  | 'NA00001' | '0|0' |   49 |    3 | [58, 50]     |
+---------+-------+-------------+-------+-------+--------+----------+-----------+-------+------+------+--------------+

New in version 0.5.

Read the Docs v: v0.10
Versions
latest
v0.10
v0.9
v0.8
v0.7
v0.6
v0.5.1
v0.5
v0.4
v0.3
v0.2
Downloads
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.