R/get_background_rate.R
get_background_rate.Rd
Runs through the target regions base by base counting the mismatches. Then it divides sum(mismatches)/sum(depths) for all bases in the targets
get_background_rate(bam, targets, reference, vaf_threshold = 0.1, tag = "", black_list = NULL, substitution_specific = TRUE, min_base_quality = 20, max_depth = 1e+05, min_mapq = 30)
bam | path to bam file |
---|---|
targets | a data frame with the target regions. Must have three columns: chr, start and end |
reference | the reference genome in BSgenome format |
vaf_threshold | the bases with higher than this VAF threshold will be ignored in the calculation (real mutations) |
tag | the RG tag if the bam has more than one sample |
black_list | a character vector of genomic loci of format chr_pos if substitution_specific is false, or chr_pos_ref_alt if substitution_specific is true. The background will be computed on the target regions after excluding black_list loci. |
substitution_specific | logical, whether to have the loci of black_list by substitutions. |
min_base_quality | minimum base quality for a read to be counted |
max_depth | maximum depth above which sampling will happen |
min_mapq | the minimum mapping quality for a read to be counted |
a list containing the general mismatch rate and substitution-specific rates
Computes the background rate of the input bam file for all bases in the specified targets. Substitutions-specific rates are also calculated.
Genomic positions having non-reference allele frequency higher than vaf_threshold will be excluded (to exclude SNPs and real mutations).
If a black_list is specified, the positions in the black_list (whether substitution_specific or not) will be excluded before computing the background rate.
create_black_list
test_ctDNA
create_background_panel
# \donttest{ ## Load example data data("targets", package = "ctDNAtools") bamT1 <- system.file("extdata", "T1.bam", package = "ctDNAtools") bamN1 <- system.file("extdata", "N1.bam", package = "ctDNAtools") bamN2 <- system.file("extdata", "N2.bam", package = "ctDNAtools") bamN3 <- system.file("extdata", "N3.bam", package = "ctDNAtools") ## Use human reference genome from BSgenome.Hsapiens.UCSC.hg19 library suppressMessages(library(BSgenome.Hsapiens.UCSC.hg19)) ## basic usage get_background_rate(bamT1, targets, BSgenome.Hsapiens.UCSC.hg19)#> $rate #> [1] 0.0001508991 #> #> $CA #> [1] 5.128819e-05 #> #> $CG #> [1] 3.419213e-05 #> #> $CT #> [1] 0.0001111244 #> #> $TA #> [1] 2.339893e-05 #> #> $TC #> [1] 6.239714e-05 #> #> $TG #> [1] 2.339893e-05 #>## more options get_background_rate(bamT1, targets, BSgenome.Hsapiens.UCSC.hg19, min_base_quality = 30, min_mapq = 40, vaf_threshold = 0.05 )#> $rate #> [1] 0.0001403503 #> #> $CA #> [1] 4.332042e-05 #> #> $CG #> [1] 3.465634e-05 #> #> $CT #> [1] 0.000103969 #> #> $TA #> [1] 2.365334e-05 #> #> $TC #> [1] 6.307556e-05 #> #> $TG #> [1] 1.576889e-05 #>## with blacklist bg_panel <- create_background_panel( bam_list = c(bamN1, bamN2, bamN3), targets = targets, reference = BSgenome.Hsapiens.UCSC.hg19, substitution_specific = TRUE ) bl2 <- create_black_list(bg_panel, mean_vaf_quantile = 0.99, min_samples_one_read = 2, min_samples_two_reads = 1 )#>#>#>#>get_background_rate(bamT1, targets, BSgenome.Hsapiens.UCSC.hg19, black_list = bl2 )#> $rate #> [1] 0.0001427435 #> #> $CA #> [1] 5.128907e-05 #> #> $CG #> [1] 1.709636e-05 #> #> $CT #> [1] 0.0001111263 #> #> $TA #> [1] 2.339893e-05 #> #> $TC #> [1] 6.239714e-05 #> #> $TG #> [1] 2.339893e-05 #># }