The HiCData class

A class for handling HiC read data.

class hifive.hic_data.HiCData(filename, mode='r', silent=False)

This class handles interaction count data for HiC experiments.

This class stores mapped paired-end reads, indexing them by fend-end (fend) number, in an h5dict.

Note

This class is also available as hifive.HiCData

When initialized, this class creates an h5dict in which to store all data associated with this object.

Parameters:
  • filename (str.) – The file name of the h5dict. This should end with the suffix ‘.hdf5’
  • mode (str.) – The mode to open the h5dict with. This should be ‘w’ for creating or overwriting an h5dict with name given in filename.
  • silent (bool.) – Indicates whether to print information about function execution for this object.
Returns:

HiCData class object.

Attributes:
  • file (str.) A string containing the name of the file passed during object creation for saving the object to.
  • silent (bool.) - A boolean indicating whether to suppress all of the output messages.
  • history (str.) - A string containing all of the commands executed on this object and their outcomes.
export_to_mat(outfilename)

Write reads loaded in data object to text file in HiCPipe-compatible ‘mat’ format.

Parameters:outfilename (str.) – Specifies the file to save data in.
Returns:None
load()

Load data from h5dict specified at object creation.

Any call of this function will overwrite current object data with values from the last save() call.

Returns:None
load_data_from_bam(fendfilename, filelist, maxinsert)

Read interaction counts from pairs of BAM-formatted alignment file(s) and place in h5dict.

Parameters:
  • fendfilename (str.) – This specifies the file name of the Fend object to associate with the dataset.
  • filelist (list of mapped sequencing runs. Each run should be a list of the first and second read end bam files ([[run1_1, run1_2], [run2_1, run2_2]...])) – A list containing lists of paired end bam files. If only one pair of files is needed, the list may contain both file path strings.
  • maxinsert (int.) – A cutoff for filtering paired end reads whose total distance to their respective restriction sites exceeds this value.
Returns:

None

Attributes:
  • fendfilename (str.) - A string containing the relative path of the fend file.
  • cis_data (ndarray) - A numpy array of type int32 and shape N x 3 where N is the number of valid non-zero intra-chromosomal fend pairings observed in the data. The first column contains the fend index (from the ‘fends’ array in the fend object) of the upstream fend, the second column contains the idnex of the downstream fend, and the third column contains the number of reads observed for that fend pair.
  • cis_indices (ndarray) - A numpy array of type int64 and a length of the number of fends + 1. Each position contains the first entry for the correspondingly-indexed fend in the first column of ‘cis_data’. For example, all of the downstream cis interactions for the fend at index 5 in the fend object ‘fends’ array are in cis_data[cis_indices[5]:cis_indices[6], :].
  • trans_data (ndarray) - A numpy array of type int32 and shape N x 3 where N is the number of valid non-zero inter-chroosomal fend pairings observed in the data. The first column contains the fend index (from the ‘fends’ array in the fend object) of the upstream fend (upstream also refers to the lower indexed chromosome in this context), the second column contains the index of the downstream fend, and the third column contains the number of reads observed for that fend pair.
  • trans_indices (ndarray) - A numpy array of type int64 and a length of the number of fends + 1. Each position contains the first entry for the correspondingly-indexed fend in the first column of ‘trans_data’. For example, all of the downstream trans interactions for the fend at index 5 in the fend object ‘fends’ array are in cis_data[cis_indices[5]:cis_indices[6], :].
  • frags (ndarray) - A filestream to the hdf5 fend file such that all saved fend attributes can be accessed through this class attribute.
  • maxinsert (int.) - An interger denoting the maximum included distance sum between both read ends and their downstream RE site.

When data is loaded the ‘history’ attribute is updated to include the history of the fend file that becomes associated with it.

load_data_from_mat(fendfilename, filename, maxinsert=0)

Read interaction counts from a HiCPipe-compatible ‘mat’ text file and place in h5dict.

Parameters:
  • fendfilename (str.) – This specifies the file name of the Fend object to associate with the dataset.
  • filename (str.) – File name of a ‘mat’ file containing fend pair and interaction count data.
  • maxinsert (int.) – A cutoff for filtering paired end reads whose total distance to their respective restriction sites exceeds this value.
Returns:

None

Attributes:
  • fendfilename (str.) - A string containing the relative path of the fend file.
  • cis_data (ndarray) - A numpy array of type int32 and shape N x 3 where N is the number of valid non-zero intra-chromosomal fend pairings observed in the data. The first column contains the fend index (from the ‘fends’ array in the fend object) of the upstream fend, the second column contains the idnex of the downstream fend, and the third column contains the number of reads observed for that fend pair.
  • cis_indices (ndarray) - A numpy array of type int64 and a length of the number of fends + 1. Each position contains the first entry for the correspondingly-indexed fend in the first column of ‘cis_data’. For example, all of the downstream cis interactions for the fend at index 5 in the fend object ‘fends’ array are in cis_data[cis_indices[5]:cis_indices[6], :].
  • trans_data (ndarray) - A numpy array of type int32 and shape N x 3 where N is the number of valid non-zero inter-chroosomal fend pairings observed in the data. The first column contains the fend index (from the ‘fends’ array in the fend object) of the upstream fend (upstream also refers to the lower indexed chromosome in this context), the second column contains the index of the downstream fend, and the third column contains the number of reads observed for that fend pair.
  • trans_indices (ndarray) - A numpy array of type int64 and a length of the number of fends + 1. Each position contains the first entry for the correspondingly-indexed fend in the first column of ‘trans_data’. For example, all of the downstream trans interactions for the fend at index 5 in the fend object ‘fends’ array are in cis_data[cis_indices[5]:cis_indices[6], :].
  • frags (ndarray) - A filestream to the hdf5 fend file such that all saved fend attributes can be accessed through this class attribute.
  • maxinsert (int.) - An interger denoting the maximum included distance sum between both read ends and their downstream RE site.

When data is loaded the ‘history’ attribute is updated to include the history of the fend file that becomes associated with it.

load_data_from_raw(fendfilename, filelist, maxinsert)

Read interaction counts from a text file(s) and place in h5dict.

Files should contain both mapped ends of a read, one read per line, separated by tabs. Each line should be in the following format:

chromosome1    coordinate1  strand1   chromosome2    coordinate2  strand2

where strands are given by the characters ‘+’ and ‘-‘.

Parameters:
  • fendfilename (str.) – This specifies the file name of the Fend object to associate with the dataset.
  • filelist (list) – A list containing all of the file names of mapped read text files to be included in the dataset. If only one file is needed, this may be passed as a string.
  • maxinsert (int.) – A cutoff for filtering paired end reads whose total distance to their respective restriction sites exceeds this value.
Returns:

None

Attributes:
  • fendfilename (str.) - A string containing the relative path of the fend file.
  • cis_data (ndarray) - A numpy array of type int32 and shape N x 3 where N is the number of valid non-zero intra-chromosomal fend pairings observed in the data. The first column contains the fend index (from the ‘fends’ array in the fend object) of the upstream fend, the second column contains the idnex of the downstream fend, and the third column contains the number of reads observed for that fend pair.
  • cis_indices (ndarray) - A numpy array of type int64 and a length of the number of fends + 1. Each position contains the first entry for the correspondingly-indexed fend in the first column of ‘cis_data’. For example, all of the downstream cis interactions for the fend at index 5 in the fend object ‘fends’ array are in cis_data[cis_indices[5]:cis_indices[6], :].
  • trans_data (ndarray) - A numpy array of type int32 and shape N x 3 where N is the number of valid non-zero inter-chroosomal fend pairings observed in the data. The first column contains the fend index (from the ‘fends’ array in the fend object) of the upstream fend (upstream also refers to the lower indexed chromosome in this context), the second column contains the index of the downstream fend, and the third column contains the number of reads observed for that fend pair.
  • trans_indices (ndarray) - A numpy array of type int64 and a length of the number of fends + 1. Each position contains the first entry for the correspondingly-indexed fend in the first column of ‘trans_data’. For example, all of the downstream trans interactions for the fend at index 5 in the fend object ‘fends’ array are in cis_data[cis_indices[5]:cis_indices[6], :].
  • frags (ndarray) - A filestream to the hdf5 fend file such that all saved fend attributes can be accessed through this class attribute.
  • maxinsert (int.) - An interger denoting the maximum included distance sum between both read ends and their downstream RE site.

When data is loaded the ‘history’ attribute is updated to include the history of the fend file that becomes associated with it.

save()

Save analysis parameters to h5dict.

Returns:None