A function to extract fragment lengths from a bam file. Optionally, given a mutation data frame, it can categorize read lengths in mutated vs non-mutated reads.
get_fragment_size(bam, mutations = NULL, targets = NULL, tag = "", isProperPair = NA, mapqFilter = 30, min_size = 1, max_size = 400, ignore_trimmed = TRUE, different_strands = TRUE, simple_cigar = FALSE)
bam | path to bam file. |
---|---|
mutations | An optional data frame with mutations. Must have the columns CHROM, POS, REF, ALT. |
targets | a data frame with the target regions to restrict the reads in the bam. Must have three columns: chr, start and end |
tag | the RG tag if the bam has more than one sample. |
isProperPair | a logical whether to return only proper pairs (true), only improper pairs (false), or it does not matter (NA). |
mapqFilter | mapping quality threshold for considering reads. |
min_size | Integer with the lowest fragment length. |
max_size | Integer with the highest fragment length. |
ignore_trimmed | logical, whether to remove reads that have been hard trimmed. |
different_strands | logical, whether to keep only reads whose mates map to different strand. |
simple_cigar | logical, whether to include only reads with simple cigar. |
A data frame with the columns:
Sample: The SM tag in bam or file name
ID: the read ID
chr: chromosome
start: the left most end of either the read or mate
end: the right most end of either the read or mate.
size: the fragment size
category (only if mutations is provided): either ref, alt, or other
Extracts the fragment size of reads in the input bam that satisfy the following conditions:
Paired, and optionally properly paired depending on the isProperPair parameter.
Both the reads and mate are mapped.
Not secondary or supplementary alignment
Not duplicate
Passing quality control
Read and mate on different strands (optional depending on the different_strands parameter)
Not trimmed (optional depending on the ignore_trimmed parameter), i.e. will keep only reads with the max length.
Having a simple cigar (optional depending on the simple_cigar parameter).
Satisfy the mapping quality threshold specified in the mapqFilter parameter.
Reads and mates on the same chromosome when min_size > 0.
When the input mutations is given, the output will label the reads that support the variant alleles of the mutation in a separate column.
summarize_fragment_size
bin_fragment_size
analyze_fragmentation
get_mutations_fragment_size
# \donttest{ data("mutations", package = "ctDNAtools") data("targets", package = "ctDNAtools") bamT1 <- system.file("extdata", "T1.bam", package = "ctDNAtools") ## basic usage fs <- get_fragment_size(bam = bamT1) ## More options fs1 <- get_fragment_size( bam = bamT1, isProperPair = TRUE, min_size = 70, max_size = 200, ignore_trimmed = FALSE, different_strands = FALSE, simple_cigar = TRUE ) ## with mutations input fs2 <- get_fragment_size(bam = bamT1, mutations = mutations) ## using targets fs3 <- get_fragment_size(bam = bamT1, targets = targets) # }