This function Filters a set of mutations given the input black list or the prevalence of their mismatches in a set of bam files. Mutations that have more than min_alt_reads in more than min_samples will be removed when no black list is given.
filter_mutations(mutations, bams = NULL, black_list = NULL, tags = rep("", length(bams)), min_alt_reads = 2, min_samples = 2, min_base_quality = 20, max_depth = 1e+05, min_mapq = 30, substitution_specific = TRUE)
mutations | A data frame with the reporter mutations. Should have the columns CHROM, POS, REF, ALT. |
---|---|
bams | a vector of paths to bam files |
black_list | a character vector of genomic loci of format chr_pos to filter. If not given, the bams will be scanned for mismatches in the mutations loci and the specified thresholds will be applied for filtering. |
tags | a vector of the RG tags if the bam has more than one sample |
min_alt_reads | the threshold of read counts showing alternative allele for a sample to be counted |
min_samples | the threshold of number of samples above which the mutations is filtered |
min_base_quality | minimum base quality for a read to be counted |
max_depth | maximum depth above which sampling will happen |
min_mapq | the minimum mapping quality for a read to be counted |
substitution_specific | logical, whether to have the loci of black_list by substitutions. |
a named list contains:
ref: vector of read counts of the reference alleles
alt: vector of read counts of the alternative allele
Filter a set of mutations using one of two options:
By providing a black list (recommended), which includes a vector of genomic loci chr_pos when substitution_specific is false, or chr_pos_ref_alt when substitutions_specific is true. In this mode, all mutations reported in the black list are simply removed.
By providing a set of bam files. The function will run a similar functionality to create_background_panel
and filter
mutations based on the min_alt_reads and min_samples criteria.
This function is called internally in test_ctDNA
so you likely won't need to use it yourself.
create_black_list
test_ctDNA
create_background_panel
data("mutations", package = "ctDNAtools") filter_mutations(mutations, black_list = "chr14_106327474_C_G")#>#>#> CHROM POS REF ALT PHASING #> 2 chr14 106327649 G T <NA> #> 3 chr14 106327759 A T <NA> #> 4 chr14 106327821 T C <NA> #> 5 chr14 106327838 T A <NA> #> 6 chr14 106327869 C A 106327869_C_A #> 7 chr14 106327884 A C 106327869_C_A #> 8 chr14 106327909 A C 106327869_C_A #> 9 chr14 106327929 A G 106327869_C_A #> 10 chr14 106327966 C T <NA>