This function Filters a set of mutations given the input black list or the prevalence of their mismatches in a set of bam files. Mutations that have more than min_alt_reads in more than min_samples will be removed when no black list is given.

filter_mutations(mutations, bams = NULL, black_list = NULL,
  tags = rep("", length(bams)), min_alt_reads = 2, min_samples = 2,
  min_base_quality = 20, max_depth = 1e+05, min_mapq = 30,
  substitution_specific = TRUE)

Arguments

mutations

A data frame with the reporter mutations. Should have the columns CHROM, POS, REF, ALT.

bams

a vector of paths to bam files

black_list

a character vector of genomic loci of format chr_pos to filter. If not given, the bams will be scanned for mismatches in the mutations loci and the specified thresholds will be applied for filtering.

tags

a vector of the RG tags if the bam has more than one sample

min_alt_reads

the threshold of read counts showing alternative allele for a sample to be counted

min_samples

the threshold of number of samples above which the mutations is filtered

min_base_quality

minimum base quality for a read to be counted

max_depth

maximum depth above which sampling will happen

min_mapq

the minimum mapping quality for a read to be counted

substitution_specific

logical, whether to have the loci of black_list by substitutions.

Value

a named list contains:

  • ref: vector of read counts of the reference alleles

  • alt: vector of read counts of the alternative allele

Details

Filter a set of mutations using one of two options:

1.

By providing a black list (recommended), which includes a vector of genomic loci chr_pos when substitution_specific is false, or chr_pos_ref_alt when substitutions_specific is true. In this mode, all mutations reported in the black list are simply removed.

2.

By providing a set of bam files. The function will run a similar functionality to create_background_panel and filter mutations based on the min_alt_reads and min_samples criteria.

This function is called internally in test_ctDNA so you likely won't need to use it yourself.

See also

create_black_list test_ctDNA create_background_panel

Examples

data("mutations", package = "ctDNAtools") filter_mutations(mutations, black_list = "chr14_106327474_C_G")
#> Filtering mutations ...
#> Dropped 1 mutations
#> CHROM POS REF ALT PHASING #> 2 chr14 106327649 G T <NA> #> 3 chr14 106327759 A T <NA> #> 4 chr14 106327821 T C <NA> #> 5 chr14 106327838 T A <NA> #> 6 chr14 106327869 C A 106327869_C_A #> 7 chr14 106327884 A C 106327869_C_A #> 8 chr14 106327909 A C 106327869_C_A #> 9 chr14 106327929 A G 106327869_C_A #> 10 chr14 106327966 C T <NA>