HiFive data is handled using the
Loading 5C data¶
HiFive can load 5C data from one of two source file types.
When loading 5C data from BAM files, they should always come in pairs, one for each end of the paired-end reads. HiFive can load any number of pairs of BAM files, such as when multiple sequencing lanes have been run for a single replicate. These files do not need to be indexed or sorted. All sequence names that these files were mapped against should exactly match the primer names in the BED file used to construct the Fragment object.
Counts files are tabular text files containing pairs of primer names and a count of the number of observed occurrences of that pairing.
5c_for_primer1 5c_rev_primer2 10 5c_for_primer1 5c_rev_primer4 3 5c_for_primer3 5c_rev_primer4 18
Loading HiC Data¶
HiFive can load HiC data from three different types of source files.
When loading HiC data from BAM files, they should always come in pairs, one for each end of the paired-end reads. HiFive can load any number of pairs of BAM files, such as when multiple sequencing lanes have been run for a single replicate. These files do not need to be indexed or sorted. For faster loading, especially with very large numbers of reads, it is helpful to parse out single-mapped reads to reduce the number of reads that HiFive needs to traverse in reading the BAM files.
RAW files are tabular text files containing pairs of read coordinates from mapped reads containing the chromosome, coordinate, and strand for each read end. HiFive can load any number of RAW files into a single HiC Data object.
chr1 30002023 + chr3 4020235 - chr5 9326220 - chr1 3576222 + chr8 1295363 + chr6 11040321 +
MAT files are in a tabular text format previously defined for HiCPipe. This format consists of a pair of fend indices and a count of observed occurrences of that pairing. These indices must match those associated with the Fend object used when loading the data. Thus it is wise when using this format to also create the Fend object from a HiCPipe-style fend file to ensure accurate fend-count association.
fend1 fend2 count 1 4 10 1 10 5 1 13 1
In order to maintain compatibility with HiCPipe, both tabular fend files and MAT files are 1-indexed, rather than the standard 0-indexed used everywhere else with HiFive.
Matrix files are tab-separated files that contain a square matrix of values corresponding to binned read counts. These files can contain labels with the first line containing a tab followed by a tab-separated list of bin labels and each subsequent line containing a label followed by bin values. Labels should be in a format such that the bin position occurs after the ‘|’ character and in the form chrX:XXXX-XXXX (e.g. interval1|myexpriment|chr3:1000000-1040000). If no labels are provided, bins are assumed to be identical to the partitioning in the associated Fend object and starting with the first bin for the associated chromosome(s). Labeled matrices need not include all rows or columns for a given paritioning. Values falling outside of bins are discarded. Each chromosome or chromosome pair (inter-chromosomal interactions) should be in a separate file with the chrosome name appearing before the first ‘.’ in the filename (e.g. chr1.matrix). Inter-chromosomal matrix files should be named with the chromosome names separated by ‘_by_’ (e.g. chr1_by_chr2.matrix).