Runs through the target regions base by base counting the mismatches. Then it divides sum(mismatches)/sum(depths) for all bases in the targets

get_background_rate(bam, targets, reference, vaf_threshold = 0.1,
  tag = "", black_list = NULL, substitution_specific = TRUE,
  min_base_quality = 20, max_depth = 1e+05, min_mapq = 30)

Arguments

bam

path to bam file

targets

a data frame with the target regions. Must have three columns: chr, start and end

reference

the reference genome in BSgenome format

vaf_threshold

the bases with higher than this VAF threshold will be ignored in the calculation (real mutations)

tag

the RG tag if the bam has more than one sample

black_list

a character vector of genomic loci of format chr_pos if substitution_specific is false, or chr_pos_ref_alt if substitution_specific is true. The background will be computed on the target regions after excluding black_list loci.

substitution_specific

logical, whether to have the loci of black_list by substitutions.

min_base_quality

minimum base quality for a read to be counted

max_depth

maximum depth above which sampling will happen

min_mapq

the minimum mapping quality for a read to be counted

Value

a list containing the general mismatch rate and substitution-specific rates

Details

Computes the background rate of the input bam file for all bases in the specified targets. Substitutions-specific rates are also calculated.

Genomic positions having non-reference allele frequency higher than vaf_threshold will be excluded (to exclude SNPs and real mutations).

If a black_list is specified, the positions in the black_list (whether substitution_specific or not) will be excluded before computing the background rate.

See also

create_black_list test_ctDNA create_background_panel

Examples

# \donttest{ ## Load example data data("targets", package = "ctDNAtools") bamT1 <- system.file("extdata", "T1.bam", package = "ctDNAtools") bamN1 <- system.file("extdata", "N1.bam", package = "ctDNAtools") bamN2 <- system.file("extdata", "N2.bam", package = "ctDNAtools") bamN3 <- system.file("extdata", "N3.bam", package = "ctDNAtools") ## Use human reference genome from BSgenome.Hsapiens.UCSC.hg19 library suppressMessages(library(BSgenome.Hsapiens.UCSC.hg19)) ## basic usage get_background_rate(bamT1, targets, BSgenome.Hsapiens.UCSC.hg19)
#> $rate #> [1] 0.0001508991 #> #> $CA #> [1] 5.128819e-05 #> #> $CG #> [1] 3.419213e-05 #> #> $CT #> [1] 0.0001111244 #> #> $TA #> [1] 2.339893e-05 #> #> $TC #> [1] 6.239714e-05 #> #> $TG #> [1] 2.339893e-05 #>
## more options get_background_rate(bamT1, targets, BSgenome.Hsapiens.UCSC.hg19, min_base_quality = 30, min_mapq = 40, vaf_threshold = 0.05 )
#> $rate #> [1] 0.0001403503 #> #> $CA #> [1] 4.332042e-05 #> #> $CG #> [1] 3.465634e-05 #> #> $CT #> [1] 0.000103969 #> #> $TA #> [1] 2.365334e-05 #> #> $TC #> [1] 6.307556e-05 #> #> $TG #> [1] 1.576889e-05 #>
## with blacklist bg_panel <- create_background_panel( bam_list = c(bamN1, bamN2, bamN3), targets = targets, reference = BSgenome.Hsapiens.UCSC.hg19, substitution_specific = TRUE ) bl2 <- create_black_list(bg_panel, mean_vaf_quantile = 0.99, min_samples_one_read = 2, min_samples_two_reads = 1 )
#> 19 loci added satisfying Mean VAF condition
#> 0 loci added satisfying one read condition
#> 2 loci added satisfying two reads condition
#> Black list has 19 loci
get_background_rate(bamT1, targets, BSgenome.Hsapiens.UCSC.hg19, black_list = bl2 )
#> $rate #> [1] 0.0001427435 #> #> $CA #> [1] 5.128907e-05 #> #> $CG #> [1] 1.709636e-05 #> #> $CT #> [1] 0.0001111263 #> #> $TA #> [1] 2.339893e-05 #> #> $TC #> [1] 6.239714e-05 #> #> $TG #> [1] 2.339893e-05 #>
# }