% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/com_hotspot.R
\name{com_hotspot}
\alias{com_hotspot}
\title{comprehensive amplicon ranking}
\usage{
com_hotspot(fw_panel, bins, data, amp, len, size, include_genes)
}
\arguments{
\item{fw_panel}{a dataframe containing the sequencing panel designed by fw_hotspot}

\item{bins}{A dataframe containing all potential amplicons}

\item{data}{A dataframe containing the location of each mutation.}

\item{amp}{The length of amplicons in number of base pairs}

\item{len}{The total length of sequencing panel in number of base pairs}

\item{size}{the threshold of hotspot size to split up in number of amplicons}

\item{include_genes}{True or False based on whether dataset includes gene names}
}
\value{
A dataframe containing the genomic coordinates for targeted sequencing panel
}
\description{
create a targeted sequencing panel by finding which amplicons will likely
capture the most mutations using a pseudo-exhaustive selection method
}
\details{
Comprehensive Selection Sequencing Panel Identifier (Optimal mutation capture)

1.	To conserve computational power, the forward selection sequencing panel
identifier is run to determine the lowest number of
mutations per amplicon (mutation frequency) that need to be included in the
predetermined length sequencing panel.
  a.	any amplicon generated by the algorithm, which is less than this
  threshold value, will be removed.

2.	For the feasible exhaustive selection of amplicon combinations covering
hotspot areas larger than the predefined number of amplicons in length,
the algorithm breaks these large regions into multiple smaller regions.
  a.	The amplicons covering these regions are pulled from the amplicon pool,
  based on their unique IDs.

3.	The algorithm finds both the minimum number of amplicons overlap and
all positions with this value and identifies the region with the longest
continuous spot of minimum value.
  a.	The region is split at the center of this longest continuous minimum
  post values and continues the splitting process until all smaller regions
  are less than the “n” number amplicon length set by the user.
     i.	As this set number of amplicons decreases, the computation time
     required also often decreases.

4.	All amplicons contained in these bins are added back to the amplicon pool,
based on a new unique ID.

5.	Amplicons covering hotspots less than or equal to one amplicon length are
added to the final sequencing panel dataset.

6.	To determine the optimal combination of amplicons for each region, the number
of amplicons necessary for full coverage of the bin is calculated.

7.	A list is generated of every possible combination of n, number of amplicons,
needed. For each combination of amplicons:
  a.	 amplicons that would not meet the threshold of unique mutations are
  filtered out, and the number of all mutations captured by these amplicons
  is calculated.
  b.	the combination of amplicons that yields the highest number of mutations
  is added to the final sequencing panel.

8.	All amplicons in the final sequencing panel are ranked from highest to lowest
based on the number of mutations they cover.

9.	All amplicons capturing the number of mutations equal to the cutoff are
further ranked to favor amplicons that have mutations closer in location to
the center of the amplicon.

10.	Cumulative base-pair length and cumulative mutations covered by each
amplicon are calculated.
  a.	Depending on the desired length of the targeted panel, a cutoff may be
  applied to remove all amplicons which fall below a set cumulative length.
}
\examples{

data("mutation_data")
my_bins <- amp_pool(mutation_data, 100)

my_fw_panel <- fw_hotspot(my_bins, mutation_data, 100, 1000, TRUE)

com_hotspot(my_fw_panel, my_bins, mutation_data, 100, 1000, 3, TRUE)

}
