% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/reduce.R
\name{reduceDataFrame}
\alias{reduceDataFrame}
\alias{expandDataFrame}
\title{Reduces and expands a \code{DataFrame}}
\usage{
reduceDataFrame(x, k, count = FALSE, simplify = TRUE, drop = FALSE)

expandDataFrame(x, k = NULL)
}
\arguments{
\item{x}{The \code{DataFrame} to be reduced or expanded.}

\item{k}{A ‘vector’ of length \code{nrow(x)} defining the grouping
based on which the \code{DataFrame} will be shrunk.}

\item{count}{\code{logical(1)} specifying of an additional column
(called by default \code{.n}) with the tally of rows shrunk into on
new row should be added. Note that if already existing, \code{.n}
will be silently overwritten.}

\item{simplify}{A \code{logical(1)} defining if invariant columns
should be converted to simple lists. Default is \code{TRUE}.}

\item{drop}{A \code{logical(1)} specifying whether the non-invariant
columns should be dropped altogether. Default is \code{FALSE}.}
}
\value{
An expanded (reduced) \code{DataFrame}.
}
\description{
A long dataframe can be \emph{reduced} by mergeing certain rows into a
single one.  These new variables are constructed as a \code{SimpleList}
containing all the original values. Invariant columns, i.e columns
that have the same value along all the rows that need to be
merged, can be shrunk into a new variables containing that
invariant value (rather than in list columns). The grouping of
rows, i.e. the rows that need to be shrunk together as one, is
defined by a vector.

The opposite operation is \emph{expand}. But note that for a
\code{DataFrame} to be expanded back, it must not to be simplified.
}
\section{Missing values}{


Missing values do have an important effect on \code{reduce}. Unless all
values to be reduces are missing, they will result in an
non-invariant column, and will be dropped with \code{drop = TRUE}. See
the example below.

The presence of missing values can have side effects in higher
level functions that rely on reduction of \code{DataFrame} objects.
}

\examples{
library("IRanges")

k <- sample(100, 1e3, replace = TRUE)
df <- DataFrame(k = k,
                x = round(rnorm(length(k)), 2),
                y = seq_len(length(k)),
                z = sample(LETTERS, length(k), replace = TRUE),
                ir = IRanges(seq_along(k), width = 10),
                r = Rle(sample(5, length(k), replace = TRUE)),
                invar = k + 1)
df

## Shinks the DataFrame
df2 <- reduceDataFrame(df, df$k)
df2

## With a tally of the number of members in each group
reduceDataFrame(df, df$k, count = TRUE)

## Much faster, but more crowded result
df3 <- reduceDataFrame(df, df$k, simplify = FALSE)
df3

## Drop all non-invariant columns
reduceDataFrame(df, df$k, drop = TRUE)

## Missing values
d <- DataFrame(k = rep(1:3, each = 3),
               x = letters[1:9],
               y = rep(letters[1:3], each = 3),
               y2 = rep(letters[1:3], each = 3))
d

## y is invariant and can be simplified
reduceDataFrame(d, d$k)
## y isn't not dropped
reduceDataFrame(d, d$k, drop = TRUE)

## BUT with a missing value
d[1, "y"] <- NA
d

## y isn't invariant/simplified anymore
reduceDataFrame(d, d$k)
## y now gets dropped
reduceDataFrame(d, d$k, drop = TRUE)
}
\author{
Laurent Gatto
}
