The Fragment class¶
A class for handling 5C fragment information.
Fragment(filename, mode='r', silent=False)¶
This class handles restriction enzyme digest-generated fragment data for 5C experiments.
The Fragment class contains all of the genomic information for a 5C experiment, including fragment locations and orientations, chromosome name mapping, and region locations and indices.
This class is also available as hifive.Fragment
When initialized, this class creates an h5dict in which to store all data associated with this object.
- filename (str.) – The file name of the h5dict. This should end with the suffix ‘.hdf5’
- mode (str.) – The mode to open the h5dict with. This should be ‘w’ for creating or overwriting an h5dict with name given in filename.
- silent (bool.) – Indicates whether to print information about function execution for this object.
- file (str.) - A string containing the name of the file passed during object creation for saving the object to.
- silent (bool.) - A boolean indicating whether to suppress all of the output messages.
- history (str.) - A string containing all of the commands executed on this object and their outcome.
Load fragment data from h5dict specified at object creation.
Any call of this function will overwrite current object data with values from the last
load_fragments(filename, genome_name=None, re_name=None, regions=, minregionspacing=1000000)¶
Parse and store fragment data from a bed for a 5C assay file into an h5dict.
- filename (str.) – A file name to read restriction fragment data from. This should be a BED file containing fragment boundaries for all probed fragments and primer names that match those used for read mapping.
- genome_name (str.) – The name of the species and build. Optional.
- re_name (str.) – The name of the restriction enzyme used to produce the fragment set. Optional.
- regions (list) – User-defined partitioning of fragments into different regions. This argument should be a list of lists containing the chromosome, start, and stop coordinates for each region.
- minregionspacing (int.) – If ‘regions’ is not defined, this is used to parse regions by inserting breaks where fragments are spaced apart greater than this value.
- chromosomes (ndarray) - A numpy array containing chromosome names as strings. The position of the chromosome name in this array is referred to as the chromosome index.
- fragments (ndarray) - A numpy array of length N where N is the number of fragments and containing the fields ‘chr’, ‘start’, ‘stop’, ‘mid’, ‘strand’, ‘region’, and ‘name’. With the exception of the ‘name’ field which is of type string, all of these are of type int32. The ‘chr’ and ‘region’ fields contain the indices of the chromosome and region, respectively. If the bed file used to create the Fragment object contains additional columns, these features are also included as fields with names corresponding to the bed header names. These additional fields are of type float32. Fragments are sorted by chromosome (the order in the ‘chromosomes’ array) and then by coordinates.
- chr_indices (ndarray) - A numpy array with a length of the number of chromosomes in ‘chromosomes’ + 1. This array contains the first position in ‘fragments’ for the chromosome in the corresponding position in the ‘chromosomes’ array. The last position in the array contains the total number of fragments.
- regions (ndarray) - A numpy array of length equal to the number of regions a containing the fields ‘index’, ‘chromosome’, ‘start_frag’, ‘stop_frag’, ‘start’ and ‘stop’. Except for ‘chromosome’ which is a string, all fields are of type int32.
Save fragment data to h5dict.