bcftbx.TabFile

Legacy module providing classes for working with generic tab-delimited data.

The core functionality has been moved to the io.tabular module, which now implements the TabFile, TabLine and TabFileIterator classes, and which are used within this module as a basis for the backwards-compatible versions:

  • TabFile: represents a tab-delimited data file;

  • TabDataLine: represents a line of tab-delimited data;

  • TabFileIterator: simple iterator for tab-delimited data files.

The classes in this module are now deprecated and should not be used in new code; they are likely to be removed in a future release.

class bcftbx.TabFile.TabDataLine(line=None, column_names=None, delimiter='\t', lineno=None, convert=True, allow_underscores_in_numeric_literals=False, line_number=None, convert_values=None)

Class to store a line of data from a tab-delimited file

Values can be accessed by integer index or by column names (if set), e.g.

line = TabDataLine(“1 2 3”,(‘first’,’second’,’third’))

allows the 2nd column of data to accessed either via line[1] or line[‘second’].

Values can also be changed, e.g.

line[‘second’] = new_value

Values are automatically converted to integer or float types as appropriate.

Subsets of data can be created using the ‘subset’ method.

Line numbers can also be set by the creating subprogram, and queried via the ‘lineno’ method.

It is possible to use a different field delimiter than tabs, by explicitly specifying the value of the ‘delimiter’ argument, e.g. for a comma-delimited line:

line = TabDataLine(“1,2,3”,delimiter=’,’)

Check if a line is empty:

if not line: print(“Blank line”)

appendColumn(key, value)

Append keyed values to the data line

This adds a new value along with a header name (i.e. key)

lineno()

Return the line number associated with the line

NB The line number is set by the class or function which created the TabDataLine object, it is not guaranteed by the TabDataLine class itself.

class bcftbx.TabFile.TabFile(filen=None, fp=None, column_names=None, skip_first_line=False, first_line_is_header=False, tab_data_line=None, delimiter='\t', convert=True, allow_underscores_in_numeric_literals=False, keep_commented_lines=False)

Class to get data from a tab-delimited file

Loads data from the specified file into a data structure than can then be queried on a per line and per item basis.

Data lines are represented by data line objects which must be TabDataLine-like.

Example usage:

data = TabFile(myfile) # load initial data

print(‘%s’ % len(data)) # report number of lines of data

print(‘%s’ % data.header()) # report header (i.e. column names)

for line in data:

… # loop over lines of data

myline = data[0] # fetch first line of data

append(data=None, tabdata=None, tabdataline=None)

Create and append a new data line

Creates a new data line object and appends it to the end of the list of lines.

Optionally the ‘data’ or ‘tabdata’ arguments can specify data items which will be used to populate the new line; alternatively ‘tabdataline’ can provide a TabDataLine-based object to be appended.

If none of these are specified then a default blank TabDataLine-based object is created, appended and returned.

Parameters:
  • data – (optional) a list of data items

  • tabdata – (optional) a string of tab-delimited data items

  • tabdataline – (optional) a TabDataLine-based object

Returns:

Appended data line object.

appendColumn(name, fill_value='')

Append a new (empty) column

Parameters:
  • name – name for the new column

  • fill_value – optional, value to insert into all rows in the new column

computeColumn(column_name, compute_func)

Compute and store values in a new column

For each line of data the computation function will be invoked with the line as the sole argument, and the result will be stored in a new column with the specified name.

Parameters:
  • column_name – name or index of column to write transformation result to

  • compute_func – callable object that will be invoked to perform the computation

indexByLineNumber(n)

Return index of a data line given the file line number

Given the line number n for a line in the original file, returns the index required to access the data for that line in the TabFile object.

If no matching line is found then raises an IndexError.

insert(i, data=None, tabdata=None, tabdataline=None)

Create and insert a new data line at a specified index

Creates a new data line object and inserts it into the list of lines at the specified index position ‘i’ (nb NOT a line number).

Optionally the ‘data’ or ‘tabdata’ arguments can specify data items which will be used to populate the new line; alternatively ‘tabdataline’ can provide a TabDataLine-based object to be inserted.

Parameters:
  • i – index position to insert the line at

  • data – (optional) a list of data items

  • tabdata – (optional) a string of tab-delimited data items

  • tabdataline – (optional) a TabDataLine-based object

Returns:

New inserted data line object.

nColumns()

Return the number of columns in the file

If the file had a header then this will be the number of header columns; otherwise it will be the number of columns found in the first line of data

reorderColumns(new_columns)

Rearrange the columns in the file

Parameters:

new_columns – list of column names or indices in the new order

Returns:

New TabFile object

transformColumn(column_name, transform_func)

Apply arbitrary function to a column

For each line of data the transformation function will be invoked with the value of the named column, with the result being written back to that column (overwriting the existing value).

Parameters:
  • column_name – name of column to write transformation result to

  • transform_func – callable object that will be invoked to perform the transformation

class bcftbx.TabFile.TabFileIterator(filen=None, fp=None, column_names=None)

Iterate through lines in a tab-delimited file

Class to loop over all lines in a TSV file, returning a TabDataLine object for each record.