R/summarize_fragment_size.R
summarize_fragment_size.Rd
Summarizes fragment size in defined genomic regions
summarize_fragment_size(bam, regions, tag = "", summary_functions = list(Mean = mean, Median = median), ...)
bam | the input bam file |
---|---|
regions | data frame containing the genomic regions. Must have the columns chr, start and end. |
tag | the RG tag if the bam has more than one sample. |
summary_functions | a named list containing the R functions used for summarization, e.g. mean, sd. |
... | Other parameters passed to |
a data frame with the first column having the regions in the format of chr:start-end, and other columns correspond to summary_functions.
Fragment size for reads that are paired (optionally properly paired), whose both mates are mapped, not secondary or supplementary alignment, not duplicates, passed quality control, and satisfy mapq threshold will be used for summarization. The reads that overlap the specified regions will be summarized by the specified summary_functions. Overlaps consider fragments to span the left most to the right most coordinate from either the read or the mate. Minimum and maximum bounds of the fragment size will be applied before summarization.
get_fragment_size
bin_fragment_size
analyze_fragmentation
data("targets", package = "ctDNAtools") bamT1 <- system.file("extdata", "T1.bam", package = "ctDNAtools") ## binning the target in arbitrary way ## Note that regions don't need to be bins, ## they can be any regions in the genome regions <- data.frame( chr = targets$chr, start = seq(from = targets$start - 200, to = targets$end + 200, by = 30), stringsAsFactors = FALSE ) regions$end <- regions$start + 50 ## basic usage sfs <- summarize_fragment_size(bam = bamT1, regions = regions) ## different summary functions sfs <- summarize_fragment_size( bam = bamT1, regions = regions, summary_functions = list( Var = var, SD = sd, meanSD = function(x) mean(x) / sd(x) ) )