% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/sc_mutations.R
\name{find_variants}
\alias{find_variants}
\title{bulk variant identification}
\usage{
find_variants(
  bam_path,
  reference,
  annotation,
  min_nucleotide_depth = 100,
  homopolymer_window = 3,
  annotated_region_only = FALSE,
  names_from = "gene_name",
  threads = 1
)
}
\arguments{
\item{bam_path}{character(1) or character(n): path to the bam file(s) aligned to the
reference genome (NOT the transcriptome!).}

\item{reference}{DNAStringSet: the reference genome}

\item{annotation}{GRanges: the annotation of the reference genome. You can load
a GTF/GFF annotation file with \code{anno <- rtracklayer::import(file)}.}

\item{min_nucleotide_depth}{integer(1): minimum read depth for a position to be
considered a variant.}

\item{homopolymer_window}{integer(1): the window size to calculate the homopolymer
percentage. The homopolymer percentage is calculated as the percentage of the most
frequent nucleotide in a window of \code{-homopolymer_window} to \code{homopolymer_window}
nucleotides around the variant position, excluding the variant position itself.
Calculation of the homopolymer percentage is skipped when \code{homopolymer_window = 0}.
This is useful for filtering out Nanopore sequencing errors in homopolymer regions.}

\item{annotated_region_only}{logical(1): whether to only consider variants outside
annotated regions. If \code{TRUE}, only variants outside annotated regions will be
returned. If \code{FALSE}, all variants will be returned, which could take significantly
longer time.}

\item{names_from}{character(1): the column name in the metadata column of the annotation
(\code{mcols(annotation)[, names_from]}) to use for the \code{region} column in the output.}

\item{threads}{integer(1): number of threads to use. Threading is done over each
annotated region and (if \code{annotated_region_only = FALSE}) unannotated gaps for
each bam file.}
}
\value{
A tibble with columns: seqnames, pos, nucleotide, count, sum, freq, ref, region,
homopolymer_pct, bam_path The homopolymer percentage is calculated as the percentage of the
most frequent nucleotide in a window of \code{homopolymer_window} nucleotides around
the variant position, excluding the variant position itself.
}
\description{
Treat each bam file as a bulk sample and identify variants against the reference
}
\details{
Each bam file is treated as a bulk sample to perform pileup and identify variants.
You can run \code{sc_mutations} with the variants identified with this function
to get single-cell allele counts. Note that reference genome FASTA files may have
the chromosome names field as `>chr1 1` instead of `>chr1`. You may need to remove
the trailing number to match the chromosome names in the bam file, for example with
\code{names(ref) <- sapply(names(ref), function(x) strsplit(x, " ")[[1]][1])}.
}
\examples{
ppl <- example_pipeline("SingleCellPipeline")
ppl <- run_step(ppl, "genome_alignment")
variants <- find_variants(
  bam_path = ppl@genome_bam,
  reference = ppl@genome_fa,
  annotation = ppl@annotation,
  min_nucleotide_depth = 4
)
head(variants)
}
