This function scans the targets regions in the list of bam files, and reports the number of reference, non-reference reads for each loci in addition to the non-reference (VAF) allele frequency. Loci with VAF higher than vaf_threshold are masked with NA.

create_background_panel(bam_list, targets, reference,
  vaf_threshold = 0.05, bam_list_tags = rep("", length(bam_list)),
  min_base_quality = 10, max_depth = 1e+05, min_mapq = 20,
  substitution_specific = TRUE)

Arguments

bam_list

A character vector of paths to bam files.

targets

The targets data frame must have the columns chr, start and end.

reference

The reference genome as BSgenome object.

vaf_threshold

Loci with the fraction of non-reference reads above this value are masked with NA.

bam_list_tags

RG tags for the list of bam files. By default, the whole bam file will be used.

min_base_quality

The minimum base quality to count a read for a loci.

max_depth

Maximum depth for the pileup

min_mapq

The minimum mapping quality to count a read for a loci

substitution_specific

logical, whether to have the loci by substitutions.

Value

A named list having depth, alt and vaf data frames. Each has the same order of loci in rows and the input samples in columns.

Details

Extracts the depth, variant allele counts and variant allele frequency (VAF) for each genomic position in the input targets across a panel of bam files (e.g. from healthy samples to represent only technical noise). The extracted information can be fed to create_black_list in order to extract a black listed loci according to defined criteria

The function support two modes, either loci-specific regardless of the basepair substitution, or substitution-specific where each substitution class (e.g. C>T, C>G) are quantified separately. This behavior is controlled by the substitution_specific parameter.

VAF above vaf_threshold parameters are masked with NA, to exclude real SNPs/mutations.

Since this function can take a long time when the bam_list comprises a large number of bam files, the function supports multi-threading using the furrr and future R packages. All you need to do is call 'plan(multiprocess)' or other multi-threading strategies before calling this function.

See also

create_black_list test_ctDNA

Examples

# \donttest{ ## Load example data data("targets", package = "ctDNAtools") bamN1 <- system.file("extdata", "N1.bam", package = "ctDNAtools") bamN2 <- system.file("extdata", "N2.bam", package = "ctDNAtools") bamN3 <- system.file("extdata", "N3.bam", package = "ctDNAtools") ## Use human reference genome from BSgenome.Hsapiens.UCSC.hg19 library suppressMessages(library(BSgenome.Hsapiens.UCSC.hg19)) ## Use a black list based on loci bg_panel <- create_background_panel( bam_list = c(bamN1, bamN2, bamN3), targets = targets, reference = BSgenome.Hsapiens.UCSC.hg19, substitution_specific = FALSE )
#> #> Attaching package: ‘purrr’
#> The following object is masked from ‘package:XVector’: #> #> compact
#> The following object is masked from ‘package:GenomicRanges’: #> #> reduce
#> The following object is masked from ‘package:IRanges’: #> #> reduce
bl1 <- create_black_list(bg_panel, mean_vaf_quantile = 0.99, min_samples_one_read = 2, min_samples_two_reads = 1 )
#> 7 loci added satisfying Mean VAF condition
#> 0 loci added satisfying one read condition
#> 2 loci added satisfying two reads condition
#> Black list has 7 loci
## Use a substitution-specific black list bg_panel <- create_background_panel( bam_list = c(bamN1, bamN2, bamN3), targets = targets, reference = BSgenome.Hsapiens.UCSC.hg19, substitution_specific = TRUE ) bl2 <- create_black_list(bg_panel, mean_vaf_quantile = 0.99, min_samples_one_read = 2, min_samples_two_reads = 1 )
#> 19 loci added satisfying Mean VAF condition
#> 0 loci added satisfying one read condition
#> 2 loci added satisfying two reads condition
#> Black list has 19 loci
## Multi-threading (commented) ## library(furrr) ## plan(multiprocess) ## plan(multiprocess, workers = 3) bg_panel <- create_background_panel( bam_list = c(bamN1, bamN2, bamN3), targets = targets, reference = BSgenome.Hsapiens.UCSC.hg19, substitution_specific = TRUE ) # }