% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/RNAsequences.R
\name{RNAsequences}
\alias{RNAsequences}
\title{Extract RNA sequence from sRNA clusters}
\usage{
RNAsequences(
  data,
  original = FALSE,
  method = c("consensus", "set"),
  match.threshold = 1,
  duplicates = "random",
  tidy = FALSE
)
}
\arguments{
\item{data}{data.frame; generated by \code{\link[=RNAimport]{RNAimport()}}}

\item{original}{logical; output results to working date as additional
columns (\code{original=TRUE}), or output as new data frame (\code{original=FALSE}).
by default, FALSE}

\item{method}{character; string to define method. Either "consensus" or "set".
The "consensus" method identifies the consensus sequences across replicated
based on the \code{bioseq} package method. Whereas the "set" is based on the fixed
sequences calculated for each replicate and whether they are exact matches or
not.}

\item{match.threshold}{numeric; the minimum number of replicates
required to share the sRNA sequence to count as a match. Default is 1. Only
applicable to the "set" method.}

\item{duplicates}{character; string to define how to deal with a tie,
"random" as default. Options include "random" and "exclude". Only
applicable to the "set" method.}

\item{tidy}{logical; tidy-up data set by removing sRNA clusters with a
unknown or unclassified consensus sRNA sequence result. By default,
\code{tidy=FALSE}, while \code{tidy=TRUE} removes sRNA clusters with
an undetermined consensus RNA sequence.}
}
\value{
The results can be added as additional columns to the working
data frame supplied to the function or stored as a new data frame containing
only the results from the function. The results includes:
\itemize{
\item Match: whether the RNA sequence is consistent across replicates
\item Sequence:  character; sequence of the most abundant sRNA within a cluster
\item Complementary_RNA: character; complementary RNA nucleotide sequence
\item Complementary_DNA: character; complementary DNA nucleotide sequence
\item Width: numeric; length of nucleotide sequence
}
}
\description{
\code{RNAsequences} extrapolates the RNA sequence for sRNA clusters
through two different methods utilising the RNA sequence of the most abundant
transcript identified within each replicate. This can either be determined
by extracting the consensus sequence across replicates or by comparing the
sequences across replicates and selecting the most abundant. In this second
method ties between sequences can be seen, hence, the user must decide
whether a sequence is then chosen at random from the most abundant or
will exclude any sequence determination.

The function also calculates the RNA & DNA complementary sequences, as well
as stating the length/width of the sequence.
}
\details{
The set method checks whether each sample in the data set shares
the same major sRNA sequence for a given sRNA cluster. If at least two
replicates share the same sRNA sequence, the sequence is pulled and the
complementary DNA and RNA sequences are calculated. Using the
\code{match.threshold} parameter, we can alter the minimum number of replicates
required to share the RNA sequence to count as a match. For example, if set
as \code{match.threshold=3}, at least 3 replicates must contain the same sequence.
As a general rule, if only one replicate has determined a sRNA sequence it is
noted that there is no match, but the sequence is pulled and the
complementary sequences calculated.

The match column can either return "Yes", "No" or "Duplicate". If a match
between replicates is found, "Yes" is supplied, if not, "No". While if
there is a tie between sequences "Duplicate" is supplied. For examples,
if an equal number of replicates have sequence "x" and sequence "y".

In the situation where duplicates are identified, as default, at random
a consensus sRNA sequence is selected. This parameter can be changed to
"exclude", and under this parameter no consensus sequence is pulled.

Whereas with the consensus method, the consensus sequence is pulled from
all replicates.
}
\examples{

data("sRNA_data")

# vector of control names
controls <- c("selfgraft_1", "selfgraft_2" , "selfgraft_3")

# Locate potentially mobile sRNA clusters associated to tomato, no
# statistical analysis
sRNA_data_mobile <- RNAmobile(input = "sRNA", data =  sRNA_data,
controls = controls, genome.ID = "B_", task = "keep", statistical = FALSE)


mobile_sequences <- RNAsequences(sRNA_data_mobile, method = "consensus") 

}
