Summarizes fragment size in defined genomic regions

summarize_fragment_size(bam, regions, tag = "",
  summary_functions = list(Mean = mean, Median = median), ...)

Arguments

bam

the input bam file

regions

data frame containing the genomic regions. Must have the columns chr, start and end.

tag

the RG tag if the bam has more than one sample.

summary_functions

a named list containing the R functions used for summarization, e.g. mean, sd.

...

Other parameters passed to get_fragment_size

Value

a data frame with the first column having the regions in the format of chr:start-end, and other columns correspond to summary_functions.

Details

Fragment size for reads that are paired (optionally properly paired), whose both mates are mapped, not secondary or supplementary alignment, not duplicates, passed quality control, and satisfy mapq threshold will be used for summarization. The reads that overlap the specified regions will be summarized by the specified summary_functions. Overlaps consider fragments to span the left most to the right most coordinate from either the read or the mate. Minimum and maximum bounds of the fragment size will be applied before summarization.

See also

get_fragment_size bin_fragment_size analyze_fragmentation

Examples

data("targets", package = "ctDNAtools") bamT1 <- system.file("extdata", "T1.bam", package = "ctDNAtools") ## binning the target in arbitrary way ## Note that regions don't need to be bins, ## they can be any regions in the genome regions <- data.frame( chr = targets$chr, start = seq(from = targets$start - 200, to = targets$end + 200, by = 30), stringsAsFactors = FALSE ) regions$end <- regions$start + 50 ## basic usage sfs <- summarize_fragment_size(bam = bamT1, regions = regions) ## different summary functions sfs <- summarize_fragment_size( bam = bamT1, regions = regions, summary_functions = list( Var = var, SD = sd, meanSD = function(x) mean(x) / sd(x) ) )