% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/centerExpandRegions.R
\name{centerExpandRegions}
\alias{centerExpandRegions}
\title{Modifies genomic regions by centering and then expanding them}
\usage{
centerExpandRegions(
  data,
  centerBy = "center_column",
  expandBy = NULL,
  genome = NA,
  trim_start = TRUE,
  outputFormat = "GenomicRanges",
  showMessages = TRUE
)
}
\arguments{
\item{data}{PeakCombiner data frame structure with required columns
named \code{chrom}, \code{start}, \code{end}, \code{name},
\code{score}, \code{strand}, \code{center}, \code{sample_name}. Additional
columns will be maintained.}

\item{centerBy}{Allowed values are 'center_column' (default) or
'midpoint'.
\itemize{
\item 'center_column' uses the value stored in the column \code{center} to center.
\item 'midpoint' replaces the value stored in the column \code{center} based on the
\code{\link[GenomicRanges:intra-range-methods]{GenomicRanges::resize()}} followed by the expansion from based on the user
input using \code{\link[GenomicRanges:intra-range-methods]{GenomicRanges::promoters()}} to allow symmetric and asymmetic
expansion. Note that strand information, if provided is maintained.
}}

\item{expandBy}{Allowed values a numeric vector of length 1 or 2,
or 'NULL' (default).
\itemize{
\item The value from the numeric vector of length 1
is expanded in both directions from center to define
the genomic region.
Thus, the size of the resulting genomic region is 2x
the provided value + 1 (for the center coordinate).
\item The value of the numeric vector of length 2
subtracts the first value from the center and adds
the second value to the center to define the genomic
region. Thus, the size of the genomic regions is
the sum of the first value + the second value
+ 1 (for the center coordinate).
\item 'NULL' allows for data-driven definition of the
\code{expandBy} value. It calculates the median
genomic region size of the input data and uses this
value like a length 1 numeric vector for expansion.
}}

\item{genome}{Character value to define the matching genome reference to
the input data. Default value is NA. Allows values are
based on GenomicRanges supported genomes like "GRCh38",
"GRCh38.p13", "Amel_HAv3.1", "WBcel235", "TAIR10.1",
"hg38", "mm10", "rn6", "bosTau9", "canFam3", "musFur1",
"galGal6","dm6", "ce11", and "sacCer3". Please see also
help for \code{\link[Seqinfo:Seqinfo-class]{Seqinfo::Seqinfo()}} for more details.}

\item{trim_start}{Logical value of TRUE or FALSE (default). If TRUE, and
no valid reference genome are provided in \code{genome},
resulting genomic results with negative starting
coordinates will be set to 1.}

\item{outputFormat}{Character value to define format of output object.
Accepted values are "GenomicRanges" (default), "tibble"
or "data.frame".}

\item{showMessages}{Logical value of TRUE (default) or FALSE. Defines if
info messages are displayed or not.}
}
\value{
A tibble with the columns \code{chrom}, \code{start}, \code{end}, \code{name}, \code{score},
\code{strand}, \code{center}, \code{sample_name}. The definitions of these columns are
described in full in the \link{prepareInputRegions} Details.
Use as input for functions \code{\link[=filterRegions]{filterRegions()}} and
\code{\link[=combineRegions]{combineRegions()}}.
}
\description{
\code{\link[=centerExpandRegions]{centerExpandRegions()}} is an optional step that re-defines the
genomic regions by expanding them from their center. The center information
has to be stored in the input data column \code{center}, while the information for
the expansion can either be user provided or input data derived. The accepted
input is a data frame created from \code{\link[=prepareInputRegions]{prepareInputRegions()}}.
Please see \code{\link[=prepareInputRegions]{prepareInputRegions()}} for more details.
}
\details{
This is an optional function that resizes the genomic regions based on the
input peakCombiner standard data frame and the options you select. An
expected input data foam contains the following columns with the names:
\code{chrom}, \code{start}, \code{end}, \code{name}, \code{score}, \code{strand}, \code{center}, \code{sample_name}.
Such a data frame is created by the script
\link{prepareInputRegions}. This step is useful if you want all of
your peaks to be the same size for your downstream analyses. In addition, if
you want to use the "summit" information, normally obtained by some peak
callers (e.g., Macs2), this function allows you to automatically center your
regions of interest on these summits. This enables you to capture
information about the most important region within a genomic region (e.g.,
TF-binding site or highest peak) and put that region in the center of your
downstream analyses (e.g., applicable to motif-finding or "heatmaps"
summarizing multiple genomic regions).

There are two concepts that are relevant for
\link{centerExpandRegions}: how to define the center, and how much
to expand from the center.
\subsection{How to define the center?}{

When you prepared your input regions, it is recommended to use the function
\link{prepareInputRegions} provided by this package. This pre-
populated the \code{center} column with the absolute genomic coordinate of the
center of the peak region. You can either choose to define the center by
using pre-defined summit information (e.g., obtained from a peak caller like
MACS2) or re-compute the arithmetic mean and save that value in the column
\code{center}. (For details see the help for
\code{\link[=prepareInputRegions]{prepareInputRegions()}}).
}

\subsection{How much to expand from the center}{

You can choose to expand the genomic region from the center either
symmetrically or asymmetrically (different lengths before and after the
center position).

In the symmetrical case, if you want to choose the size of your genomic
region based on the input data, this function can also calculate the median
peak size across all of your genomic regions and use that value (\code{expandBy}
= NULL). Alternatively, the user is free to provide a numeric vector to
define the expansion. A numeric vector with one value is used to
symmetrically expand, while a vector with two values allows to expand
asymmetrically.
}
}
\examples{
# Load in and prepare a an accepted tibble
utils::data(syn_data_bed)

# Prepare input data
data_prepared <- prepareInputRegions(
  data = syn_data_bed,
  outputFormat = "tibble",
  showMessages = TRUE
)
# Run center and expand
data_center_expand <- centerExpandRegions(
  data = data_prepared,
  centerBy = "center_column",
  expandBy = NULL,
  outputFormat = "tibble",
  showMessages = TRUE
)

data_center_expand

# You can choose to use the midpoint and predefined values to expand

data_center_expand <- centerExpandRegions(
  data = data_prepared,
  centerBy = "midpoint",
  expandBy = c(100, 600),
  outputFormat = "tibble",
  showMessages = FALSE
)

data_center_expand

}
