The FiveC class

A class for handling 5C analysis.

class hifive.fivec.FiveC(filename, mode='r', silent=False)

This is the class for handling 5C analysis.

This class relies on Fragment and FiveCData for genomic position and interaction count data. Use this class to perform filtering of fragments based on coverage, model fragment bias and distance dependence, and downstream analysis and manipulation. This includes binning of data, plotting of data, and statistical analysis.

Note

This class is also available as hifive.FiveC

When initialized, this class creates an h5dict in which to store all data associated with this object.

Parameters:
  • filename (str.) – The file name of the h5dict. This should end with the suffix ‘.hdf5’
  • mode (str.) – The mode to open the h5dict with. This should be ‘w’ for creating or overwriting an h5dict with name given in filename.
  • silent (bool.) – Indicates whether to print information about function execution for this object.
Returns:

FiveC class object.

Attributes:
  • file (str.) - A string containing the name of the file passed during object creation for saving the object to.
  • silent (bool.) - A boolean indicating whether to suppress all of the output messages.
  • history (str.) - A string containing all of the commands executed on this object and their outcome.
  • normalization (str.) - A string stating which type of normalization has been performed on this object. This starts with the value ‘none’.

In addition, many other attributes are initialized to the ‘None’ state.

cis_heatmap(region, binsize=0, binbounds=None, start=None, stop=None, startfrag=None, stopfrag=None, datatype='enrichment', arraytype='full', skipfiltered=False, returnmapping=False, dynamically_binned=False, minobservations=0, searchdistance=0, expansion_binsize=0, removefailed=False, image_file=None, **kwargs)

Return a heatmap of cis data of the type and shape specified by the passed arguments.

This function returns a heatmap for a single region, bounded by either ‘start’ and ‘stop’ or ‘startfrag’ and ‘stopfrag’ (‘start’ and ‘stop’ take precedence). If neither is given, the complete region is included. The data in the array is determined by the ‘datatype’, being raw, fragment-corrected, distance-corrected, enrichment, or expected data. The array shape is given by ‘arraytype’ and can be compact (if unbinned), upper, or full. See fivec_binning for further explanation of ‘datatype’ and ‘arraytype’. If using dynamic binning (‘dynamically_binned’ is set to True), ‘minobservations’, ‘searchdistance’, ‘expansion_binsize’, and ‘removefailed’ are used to control the dynamic binning process. Otherwise these arguments are ignored.

Parameters:
  • region (int.) – The index of the region to obtain data from.
  • binsize (int.) – This is the coordinate width of each bin. If ‘binsize’ is zero, unbinned data is returned.
  • binbounds (numpy array) – An array containing start and stop coordinates for a set of user-defined bins. Any fragment not falling in a bin is ignored.
  • start (int.) – The smallest coordinate to include in the array, measured from fragment midpoints. If both ‘start’ and ‘startfrag’ are given, ‘start’ will override ‘startfrag’. If unspecified, this will be set to the midpoint of the first fragment for ‘region’. Optional.
  • stop (int.) – The largest coordinate to include in the array, measured from fragment midpoints. If both ‘stop’ and ‘stopfrag’ are given, ‘stop’ will override ‘stopfrag’. If unspecified, this will be set to the midpoint of the last fragment plus one for ‘region’. Optional.
  • startfrag (int.) – The first fragment to include in the array. If unspecified and ‘start’ is not given, this is set to the first fragment in ‘region’. In cases where ‘start’ is specified and conflicts with ‘startfrag’, ‘start’ is given preference. Optional
  • stopfrag (str.) – The first fragment not to include in the array. If unspecified and ‘stop’ is not given, this is set to the last fragment in ‘region’ plus one. In cases where ‘stop’ is specified and conflicts with ‘stopfrag’, ‘stop’ is given preference. Optional.
  • datatype (str.) – This specifies the type of data that is processed and returned. Options are ‘raw’, ‘distance’, ‘fragment’, ‘enrichment’, and ‘expected’. Observed values are always in the first index along the last axis, except when ‘datatype’ is ‘expected’. In this case, filter values replace counts. Conversely, if ‘raw’ is specified, unfiltered fends return value of one. Expected values are returned for ‘distance’, ‘fend’, ‘enrichment’, and ‘expected’ values of ‘datatype’. ‘distance’ uses only the expected signal given distance for calculating the expected values, ‘fragment’ uses only fragment correction values, and both ‘enrichment’ and ‘expected’ use both correction and distance mean values.
  • arraytype (str.) – This determines what shape of array data are returned in. Acceptable values are ‘compact’ (if unbinned), ‘full’, and ‘upper’. ‘compact’ means data are arranged in a N x M x 2 array where N and M are the number of forward and reverse probe fragments, respectively. ‘full’ returns a square, symmetric array of size N x N x 2 where N is the total number of fragments or bins. ‘upper’ returns only the flattened upper triangle of a full array, excluding the diagonal of size (N * (N - 1) / 2) x 2, where N is the total number of fragments or bins.
  • skipfiltered (bool.) – If True, all interaction bins for filtered out fragments are removed and a reduced-size array is returned.
  • returnmapping (bool.) – If True, a list containing the data array and either a 1d array containing fragment numbers included in the data array if the array is not compact or two 1d arrays containin fragment numbers for forward and reverse fragments if the array is compact is return. Otherwise only the data array is returned.
  • dynamically_binned (bool.) – If True, return dynamically binned data.
  • minobservations (int.) – The fewest number of observed reads needed for a bin to counted as valid and stop expanding.
  • searchdistance (int.) – The furthest distance from the bin minpoint to expand bounds. If this is set to zero, there is no limit on expansion distance.
  • expansion_binsize (int.) – The size of bins to use for data to pull from when expanding dynamic bins. If set to zero, unbinned data is used.
  • removefailed (bool.) – If a non-zero ‘searchdistance’ is given, it is possible for a bin not to meet the ‘minobservations’ criteria before stopping looking. If this occurs and ‘removefailed’ is True, the observed and expected values for that bin are zero.
  • image_file (str.) – If a filename is specified, a PNG image file is written containing the heatmap data. Arguments for the appearance of the image can be passed as additional keyword arguments.
Returns:

Array in format requested with ‘arraytype’ containing data requested with ‘datatype’. If returnmapping is True, a list is returned containined the requested data array and an array of associated positions (dependent on the binning options selected).

filter_fragments(mininteractions=20, mindistance=0, maxdistance=0)

Iterate over the dataset and remove fragments that do not have ‘minobservations’ using only unfiltered fragments and interactions falling with the distance limits specified.

In order to create a set of fragments that all have the necessary number of interactions, after each round of filtering, fragment interactions are retallied using only interactions that have unfiltered fragments at both ends.

Parameters:
  • mininteractions (int.) – The required number of interactions for keeping a fragment in analysis.
  • mindistance (int.) – The minimum inter-fragment distance to be included in filtering.
  • maxdistance (int.) – The maximum inter-fragment distance to be included in filtering. A value of zero indicates no maximum cutoff.
Returns:

None

find_binning_fragment_corrections(mindistance=0, maxdistance=0, model=['gc', 'len'], num_bins=[10, 10], parameters=['even', 'even'], learning_threshold=1.0, max_iterations=100, usereads='cis', regions=[], precorrect=False)

Using multivariate binning model, learn correction values for combinations of model parameter bins.

Parameters:
  • mindistance (int.) – The minimum inter-fend distance to be included in modeling.
  • maxdistance (int.) – The maximum inter-fend distance to be included in modeling.
  • model (list) – A list of fragment features to be used in model. Valid values are ‘len’ and any features included in the creation of the associated Fragment object.
  • num_bins (int.) – The number of approximately equal-sized bins two divide model components into.
  • parameters (list) – A list of types, one for each model parameter. Types can be either ‘even’ or ‘fixed’, indicating whether each parameter bin should contain approximately even numbers of interactions or be of fixed width spanning 1 / Nth of the range of the parameter’s values, respectively. Parameter types can also have the suffix ‘-const’ to indicate that the parameter should not be optimized.
  • remove_distance (bool.) – Use distance dependence curve in prior probability calculation for each observation.
  • learning_threshold (float) – The minimum change in log-likelihood needed to continue iterative learning process.
  • max_iterations (int.) – The maximum number of iterations to use for learning model parameters.
  • usereads (str.) – Specifies which set of interactions to use, ‘cis’, ‘trans’, and ‘all’.
  • regions (list) – A list of regions to calculate corrections for. If set as None, all region corrections are found.
  • precorrect (bool.) – Use fragment-based corrections in expected value calculations, resulting in a chained normalization approach.
Returns:

None

Attributes:
  • model_parameters (ndarray) - A numpy array of strings containing model parameter names.
  • binning_num_bins (ndarray) - A numpy array of type int32 containing the number of bins for each model parameter.
  • binning corrections (ndarray) - A numpy array of type float32 and length equal to the sum of binning_num_bins * (binning_num_bins - 1) / 2. This array contains a 1D stack of correction values, ordered according to the parameter order in the ‘model_parameters’ attribute.
  • binning_correction_indices (ndarray) - A numpy array of type int32 and length equal to the number of model parameters plus one. This array contains the first position in ‘binning_corrections’ for the first bin of the model parameter in the corresponding position in the ‘model_parameters’ array. The last position in the array contains the total number of binning correction values.
  • binning_frag_indices (ndarray) - A numpy array of type int32 and size N x M where M is the number of model parameters and N is the number of fragments. This array contains the binning index for each parameter for each fragment.

The ‘normalization’ attribute is updated to ‘binning’, ‘probability-binning’, or ‘express-binning’, depending on if the ‘precorrect’ option is selected and which normalization has been previously run.

find_distance_parameters()

Regress log counts versus inter-fragment distances to find slope and intercept values and then find the standard deviation of corrected counts.

Returns:

None

Attributes:
  • gamma (float) - A float denoting the negative slope of the distance-dependence regression line.
  • sigma (float) - A float denoting the standard deviation of nonzero data about the distance-dependence regression line.
  • region_means (ndarray) - A numpy array of type float32 and length equal to the number of regions. This is initialized to zeros until fragment correction values are found.
find_express_fragment_corrections(mindistance=0, maxdistance=0, iterations=1000, remove_distance=False, usereads='cis', regions=[], precorrect=False, logged=True, kr=False)

Using iterative approximation, learn correction values for each valid fragment.

Parameters:
  • mindistance (int.) – The minimum inter-fragment distance to be included in modeling.
  • maxdistance (int.) – The maximum inter-fragment distance to be included in modeling.
  • iterations (int.) – The number of iterations to use for learning fragment corrections.
  • remove_distance (bool.) – Specifies whether the estimated distance-dependent portion of the signal is removed prior to learning fragment corrections.
  • usereads (str.) – Specifies which set of interactions to use, ‘cis’, ‘trans’, or ‘all’.
  • regions (list) – A list of regions to calculate corrections for. If set as None, all region corrections are found.
  • precorrect (bool.) – Use binning-based corrections in expected value calculations, resulting in a chained normalization approach.
  • logged (bool.) – Use log-counts instead of counts for learning.
  • kr (bool.) – Use the Knight Ruiz matrix balancing algorithm instead of weighted matrix balancing. This option ignores ‘iterations’ and ‘logged’.
Returns:

None

Calling this function creates the following attributes:

Attributes:
  • corrections (ndarray) - A numpy array of type float32 and length equal to the number of fragments. All invalid fragments have an associated correction value of zero.

The ‘normalization’ attribute is updated to ‘express’ or ‘binning-express’, depending on if the ‘precorrect’ option is selected. In addition, if the ‘remove_distance’ option is selected, the ‘region_means’ attribute is updated such that the mean correction (sum of all valid regional correction value pairs) is adjusted to zero and the corresponding region mean is adjusted the same amount but the opposite sign.

find_probability_fragment_corrections(mindistance=0, maxdistance=0, max_iterations=1000, minchange=0.0005, learningstep=0.1, precalculate=True, regions=[], precorrect=False)
Using gradient descent, learn correction values for each valid fragment based on a Log-Normal distribution of observations.
Parameters:
  • mindistance (int.) – The minimum inter-fragment distance to be included in modeling.
  • maxdistance (int.) – The maximum inter-fragment distance to be included in modeling.
  • max_iterations (int.) – The maximum number of iterations to carry on gradient descent for.
  • minchange (float) – The cutoff threshold for early learning termination for the maximum absolute gradient value.
  • learningstep (float) – The scaling factor for decreasing learning rate by if step doesn’t meet armijo criterion.
  • precalculate (bool.) – Specifies whether the correction values should be initialized at the fragment means.
  • regions (list) – A list of regions to calculate corrections for. If set as None, all region corrections are found.
  • precorrect (bool.) – Use binning-based corrections in expected value calculations, resulting in a chained normalization approach.
Returns:

None

Attributes:
  • corrections (ndarray) - A numpy array of type float32 and length equal to the number of fragments. All invalid fragments have an associated correction value of zero.

The ‘normalization’ attribute is updated to ‘probability’ or ‘binning-probability’, depending on if the ‘precorrect’ option is selected. In addition, the ‘region_means’ attribute is updated such that the mean correction (sum of all valid regional correction value pairs) is adjusted to zero and the corresponding region mean is adjusted the same amount but the opposite sign.

find_trans_mean()

Calculate the mean signal across all valid fragment-pair trans (inter-region) interactions.

Returns:

None

Attributes:
  • trans_mean (float) - A float corresponding to the mean signal of inter-region interactions.
load()

Load analysis parameters from h5dict specified at object creation and open h5dicts for associated FiveCData and Fragment objects.

Any call of this function will overwrite current object data with values from the last save() call.

Returns:None
load_data(filename)

Load fragment-pair counts and fragment object from FiveCData object.

Parameters:

filename (str.) – Specifies the file name of the FiveCData object to associate with this analysis.

Returns:

None

Attributes:
  • datafilename (str.) - A string containing the relative path of the FiveCData file.
  • fragfilename (str.) - A string containing the relative path of the Fragment file associated with the FiveCData file.
  • frags (filestream) - A filestream to the hdf5 Fragment file such that all saved Fragment attributes can be accessed through this class attribute.
  • data (filestream) - A filestream to the hdf5 FiveCData file such that all saved FiveCData attributes can be accessed through this class attribute.
  • chr2int (dict.) - A dictionary that converts chromosome names to chromosome indices.
  • filter (ndarray) - A numpy array of type int32 and size N where N is the number of fragments. This contains the inclusion status of each fragment with a one indicating included and zero indicating excluded and is initialized with all fragments included.

When a FiveCData object is associated with the project file, the ‘history’ attribute is updated with the history of the FiveCData object.

save(out_fname=None)

Save analysis parameters to h5dict.

Parameters:filename (str.) – Specifies the file name of the FiveC object to save this analysis to.
Returns:None
trans_heatmap(region1, region2, binsize=1000000, binbounds1=None, start1=None, stop1=None, startfrag1=None, stopfrag1=None, binbounds2=None, start2=None, stop2=None, startfrag2=None, stopfrag2=None, datatype='enrichment', arraytype='full', returnmapping=False, dynamically_binned=False, minobservations=0, searchdistance=0, expansion_binsize=0, removefailed=False, skipfiltered=False, image_file=None, **kwargs)

Return a heatmap of trans data of the type and shape specified by the passed arguments.

This function returns a heatmap for trans interactions between two regions, bounded by either ‘start1’, ‘stop1’, ‘start2’ and ‘stop2’ or ‘startfrag1’, ‘stopfrag1’, ‘startfrag2’, and ‘stopfrag2’ (‘start’ and ‘stop’ take precedence). The data in the array is determined by the ‘datatype’, being raw, fragment-corrected, distance-corrected, enrichment, or expected data. The array shape is always rectangular but can be either compact (which returns two arrays) or full. See fivec_binning for further explanation of ‘datatype’ and ‘arraytype’. If using dynamic binning (‘dynamically_binned’ is set to True), ‘minobservations’, ‘searchdistance’, ‘expansion_binsize’, and ‘removefailed’ are used to control the dynamic binning process. Otherwise these arguments are ignored.

Parameters:
  • region1 (int.) – The index of the first region to obtain data from.
  • region2 (int.) – The index of the second region to obtain data from.
  • binsize (int.) – This is the coordinate width of each bin.
  • binbounds1 (numpy array) – An array containing start and stop coordinates for a set of user-defined bins for ‘region1’. Any fragment not falling in a bin is ignored.
  • start1 (int.) – The coordinate at the beginning of the smallest bin from ‘region1’. If unspecified, ‘start1’ will be the first multiple of ‘binsize’ below the ‘startfrag1’ mid. If there is a conflict between ‘start1’ and ‘startfrag1’, ‘start1’ is given preference. Optional.
  • stop1 (int.) – The largest coordinate to include in the array from ‘region1’, measured from fragment midpoints. If both ‘stop1’ and ‘stopfrag1’ are given, ‘stop1’ will override ‘stopfrag1’. ‘stop1’ will be shifted higher as needed to make the last bin of size ‘binsize’. Optional.
  • startfrag1 (int.) – The first fragment from ‘region1’ to include in the array. If unspecified and ‘start1’ is not given, this is set to the first valid fend in ‘region1’. In cases where ‘start1’ is specified and conflicts with ‘startfrag1’, ‘start1’ is given preference. Optional.
  • stopfrag1 (int.) – The first fragment not to include in the array from ‘region1’. If unspecified and ‘stop1’ is not given, this is set to the last valid fragment in ‘region1’ + 1. In cases where ‘stop1’ is specified and conflicts with ‘stopfrag1’, ‘stop1’ is given preference. Optional.
  • start1 – The coordinate at the beginning of the smallest bin from ‘region1’. If unspecified, ‘start1’ will be the first multiple of ‘binsize’ below the ‘startfrag1’ mid. If there is a conflict between ‘start1’ and ‘startfrag1’, ‘start1’ is given preference. Optional.
  • binbounds2 (numpy array) – An array containing start and stop coordinates for a set of user-defined bins for ‘region2’. Any fragment not falling in a bin is ignored.
  • stop2 (int.) – The largest coordinate to include in the array from ‘region2’, measured from fragment midpoints. If both ‘stop2’ and ‘stopfrag2’ are given, ‘stop2’ will override ‘stopfrag2’. ‘stop2’ will be shifted higher as needed to make the last bin of size ‘binsize’. Optional.
  • startfrag2 (int.) – The first fragment from ‘region2’ to include in the array. If unspecified and ‘start2’ is not given, this is set to the first valid fend in ‘region2’. In cases where ‘start2’ is specified and conflicts with ‘startfrag2’, ‘start2’ is given preference. Optional.
  • stopfrag2 (int.) – The first fragment not to include in the array from ‘region2’. If unspecified and ‘stop2’ is not given, this is set to the last valid fragment in ‘region2’ + 2. In cases where ‘stop2’ is specified and conflicts with ‘stopfrag2’, ‘stop2’ is given preference. Optional.
  • datatype (str.) – This specifies the type of data that is processed and returned. Options are ‘raw’, ‘distance’, ‘fragment’, ‘enrichment’, and ‘expected’. Observed values are aways in the first index along the last axis, except when ‘datatype’ is ‘expected’. In this case, filter values replace counts. Conversely, if ‘raw’ is specified, non-filtered bins return value of 1. Expected values are returned for ‘distance’, ‘fragment’, ‘enrichment’, and ‘expected’ values of ‘datatype’. ‘distance’ uses only the expected signal given distance for calculating the expected values, ‘fragment’ uses only fragment correction values, and both ‘enrichment’ and ‘expected’ use both correction and distance mean values.
  • arraytype (str.) – This determines what shape of array data are returned in. Acceptable values are ‘compact’ (if unbinned) and ‘full’. ‘compact’ means data are arranged in a N x M x 2 array where N and M are the number of forward and reverse probe fragments, respectively. If compact is selected, only data for the forward primers of ‘region1’ and reverse primers of ‘region2’ are returned. ‘full’ returns a square, symmetric array of size N x N x 2 where N is the total number of fragments or bins.
  • returnmapping (bool.) – If ‘True’, a list containing the data array and mapping information is returned. Otherwise only a data array(s) is returned.
  • dynamically_binned (bool.) – If ‘True’, return dynamically binned data.
  • minobservations (int.) – The fewest number of observed reads needed for a bin to counted as valid and stop expanding.
  • searchdistance (int.) – The furthest distance from the bin minpoint to expand bounds. If this is set to zero, there is no limit on expansion distance.
  • expansion_binsize (int.) – The size of bins to use for data to pull from when expanding dynamic bins. If set to zero, unbinned data is used.
  • removefailed (bool.) – If a non-zero ‘searchdistance’ is given, it is possible for a bin not to meet the ‘minobservations’ criteria before stopping looking. If this occurs and ‘removefailed’ is True, the observed and expected values for that bin are zero.
  • skipfiltered (bool.) – If ‘True’, all interaction bins for filtered out fragments are removed and a reduced-size array is returned.
  • image_file (str.) – If a filename is specified, a PNG image file is written containing the heatmap data. Arguments for the appearance of the image can be passed as additional keyword arguments.
Returns:

Array in format requested with ‘arraytype’ containing inter-region data requested with ‘datatype’. If ‘returnmapping’ is True, a list is returned with mapping information. If ‘arraytype’ is ‘full’, a single data array and two 1d arrays of fragments corresponding to rows and columns, respectively is returned. If ‘arraytype’ is ‘compact’, two data arrays are returned (forward1 by reverse2 and forward2 by reverse1) along with forward and reverse fragment positions for each array for a total of 5 arrays.

write_heatmap(filename, binsize, includetrans=True, datatype='enrichment', arraytype='full', regions=[], dynamically_binned=False, minobservations=0, searchdistance=0, expansion_binsize=0, removefailed=False)

Create an h5dict file containing binned interaction arrays, bin positions, and an index of included regions.

Parameters:
  • filename (str.) – Location to write h5dict object to.
  • binsize (int.) – Size of bins for interaction arrays. If “binsize” is zero, fragment interactions are returned without binning.
  • includetrans (bool.) – Indicates whether trans interaction arrays should be calculated and saved.
  • datatype (str.) – This specifies the type of data that is processed and returned. Options are ‘raw’, ‘distance’, ‘fragment’, ‘enrichment’, and ‘expected’. Observed values are aways in the first index along the last axis, except when ‘datatype’ is ‘expected’. In this case, filter values replace counts. Conversely, if ‘raw’ is specified, non-filtered bins return value of 1. Expected values are returned for ‘distance’, ‘fragment’, ‘enrichment’, and ‘expected’ values of ‘datatype’. ‘distance’ uses only the expected signal given distance for calculating the expected values, ‘fragment’ uses only fragment correction values, and both ‘enrichment’ and ‘expected’ use both correction and distance mean values.
  • arraytype (str.) – This determines what shape of array data are returned in. Acceptable values are ‘compact’ and ‘full’. ‘compact’ means data are arranged in a N x M x 2 array where N is the number of bins, M is the maximum number of steps between included bin pairs, and data are stored such that bin n,m contains the interaction values between n and n + m + 1. ‘full’ returns a square, symmetric array of size N x N x 2.
  • regions (list.) – If given, indicates which regions should be included. If left empty, all regions are included.
  • dynamically_binned (bool.) – If ‘True’, return dynamically binned data.
  • minobservations (int.) – The fewest number of observed reads needed for a bin to counted as valid and stop expanding.
  • searchdistance (int.) – The furthest distance from the bin minpoint to expand bounds. If this is set to zero, there is no limit on expansion distance.
  • expansion_binsize (int.) – The size of bins to use for data to pull from when expanding dynamic bins. If set to zero, unbinned data is used.
  • removefailed – If a non-zero ‘searchdistance’ is given, it is possible for a bin not to meet the ‘minobservations’ criteria before stopping looking. If this occurs and ‘removefailed’ is True, the observed and expected values for that bin are zero.
Returns:

None

The following attributes are created within the hdf5 dictionary file. Arrays are accessible as datasets while the resolution is held as an attribute.

Attributes:
  • resolution (int.) - The bin size that data are accumulated in.
  • regions (ndarray) - A numpy array containing region data for each region included in the heatmaps.
  • N.positions (ndarray) - A series of numpy arrays of type int32, one for each region where N is the region index, containing one row for each bin and four columns denoting the start and stop coordinates and first fragment and last fragment plus one for each bin. This is included if data is in the ‘full’ format.
  • N.forward_positions (ndarray) - A series of numpy arrays of type int32, one for each region where N is the region index, containing one row for each bin along the first axis and four columns denoting the start and stop coordinates and first fragment and last fragment plus one for each bin. This is included if data is in the ‘compact’ format and corresponds to only forward strand fragments.
  • N.reverse_positions (ndarray) - A series of numpy arrays of type int32, one for each region where N is the region index, containing one row for each bin along the second axis and four columns denoting the start and stop coordinates and first fragment and last fragment plus one for each bin. This is included if data is in the ‘compact’ format and corresponds to only reverse strand fragments.
  • N.counts (ndarray) - A series of numpy arrays of type int32, one for each region where N is the region index, containing the observed counts for valid fragment combinations. If arrays are in the ‘compact’ format, the first axis corresponds to forward fragments and the second axis corresponds to reverse fragments. If the array is in the ‘upper’ format,data are in an upper-triangle format such that they have N * (N - 1) / 2 entries where N is the number of fragments or bins in the region.
  • N.expected (ndarray) - A series of numpy arrays of type float32, one for each region where N is the region index, containing the expected counts for valid fragment combinations. If the array is in the ‘upper’ format,data are in an upper-triangle format such that they have N * (N - 1) / 2 entries where N is the number of fragments or bins in the region.
  • N_by_M.counts (ndarray) - A series of numpy arrays of type int32, one for each region pair N and M if trans data are included, containing the observed counts for valid fragment combinations. The region index order specifies which axis corresponds to which region. If data are in the ‘compact’ format, both region index orders will be present.
  • N_by_M.expected (ndarray) - A series of numpy arrays of type float32, one for each region pair N and M if trans data are included, containing the expected counts for valid fend combinations. The chromosome name order specifies which axis corresponds to which region. If data are in the ‘compact’ format, both region index orders will be present.