Gets fragment lengths from a bam file

A function to extract fragment lengths from a bam file. Optionally, given a mutation data frame, it can categorize read lengths in mutated vs non-mutated reads.

get_fragment_size(bam, mutations = NULL, targets = NULL, tag = "",
  isProperPair = NA, mapqFilter = 30, min_size = 1, max_size = 400,
  ignore_trimmed = TRUE, different_strands = TRUE,
  simple_cigar = FALSE)

Arguments

bam	path to bam file.
mutations	An optional data frame with mutations. Must have the columns CHROM, POS, REF, ALT.
targets	a data frame with the target regions to restrict the reads in the bam. Must have three columns: chr, start and end
tag	the RG tag if the bam has more than one sample.
isProperPair	a logical whether to return only proper pairs (true), only improper pairs (false), or it does not matter (NA).
mapqFilter	mapping quality threshold for considering reads.
min_size	Integer with the lowest fragment length.
max_size	Integer with the highest fragment length.
ignore_trimmed	logical, whether to remove reads that have been hard trimmed.
different_strands	logical, whether to keep only reads whose mates map to different strand.
simple_cigar	logical, whether to include only reads with simple cigar.

Value

A data frame with the columns:

Sample: The SM tag in bam or file name
ID: the read ID
chr: chromosome
start: the left most end of either the read or mate
end: the right most end of either the read or mate.
size: the fragment size
category (only if mutations is provided): either ref, alt, or other

Details

Extracts the fragment size of reads in the input bam that satisfy the following conditions:

Paired, and optionally properly paired depending on the isProperPair parameter.
Both the reads and mate are mapped.
Not secondary or supplementary alignment
Not duplicate
Passing quality control
Read and mate on different strands (optional depending on the different_strands parameter)
Not trimmed (optional depending on the ignore_trimmed parameter), i.e. will keep only reads with the max length.
Having a simple cigar (optional depending on the simple_cigar parameter).
Satisfy the mapping quality threshold specified in the mapqFilter parameter.
Reads and mates on the same chromosome when min_size > 0.

When the input mutations is given, the output will label the reads that support the variant alleles of the mutation in a separate column.

Examples

# \donttest{
data("mutations", package = "ctDNAtools")
data("targets", package = "ctDNAtools")
bamT1 <- system.file("extdata", "T1.bam", package = "ctDNAtools")

## basic usage
fs <- get_fragment_size(bam = bamT1)

## More options
fs1 <- get_fragment_size(
  bam = bamT1, isProperPair = TRUE, min_size = 70,
  max_size = 200, ignore_trimmed = FALSE, different_strands = FALSE,
  simple_cigar = TRUE
)

## with mutations input
fs2 <- get_fragment_size(bam = bamT1, mutations = mutations)

## using targets
fs3 <- get_fragment_size(bam = bamT1, targets = targets)
# }

Arguments

Value

Details

See also

Examples

Contents