Summarizes fragment size in defined genomic regions

summarize_fragment_size(bam, regions, tag = "",
  summary_functions = list(Mean = mean, Median = median), ...)

Arguments

bam	the input bam file
regions	data frame containing the genomic regions. Must have the columns chr, start and end.
tag	the RG tag if the bam has more than one sample.
summary_functions	a named list containing the R functions used for summarization, e.g. mean, sd.
...	Other parameters passed to `get_fragment_size`

Value

a data frame with the first column having the regions in the format of chr:start-end, and other columns correspond to summary_functions.

Details

Fragment size for reads that are paired (optionally properly paired), whose both mates are mapped, not secondary or supplementary alignment, not duplicates, passed quality control, and satisfy mapq threshold will be used for summarization. The reads that overlap the specified regions will be summarized by the specified summary_functions. Overlaps consider fragments to span the left most to the right most coordinate from either the read or the mate. Minimum and maximum bounds of the fragment size will be applied before summarization.

Examples

data("targets", package = "ctDNAtools")
bamT1 <- system.file("extdata", "T1.bam", package = "ctDNAtools")

## binning the target in arbitrary way
## Note that regions don't need to be bins,
## they can be any regions in the genome
regions <- data.frame(
  chr = targets$chr,
  start = seq(from = targets$start - 200, to = targets$end + 200, by = 30),
  stringsAsFactors = FALSE
)
regions$end <- regions$start + 50

## basic usage
sfs <- summarize_fragment_size(bam = bamT1, regions = regions)

## different summary functions
sfs <- summarize_fragment_size(
  bam = bamT1, regions = regions,
  summary_functions = list(
    Var = var, SD = sd,
    meanSD = function(x) mean(x) / sd(x)
  )
)

Summarizes fragment size in defined genomic regions

Arguments

Value

Details

See also

Examples

Contents