% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/aggregateToPseudoBulk.R
\name{aggregateToPseudoBulk}
\alias{aggregateToPseudoBulk}
\title{Aggregation of single-cell to pseudobulk data}
\usage{
aggregateToPseudoBulk(
  x,
  assay = NULL,
  sample_id = NULL,
  cluster_id = NULL,
  fun = c("sum", "mean", "median", "prop.detected", "num.detected", "sem", "number"),
  scale = FALSE,
  verbose = TRUE,
  BPPARAM = SerialParam(progressbar = verbose),
  checkValues = TRUE,
  h5adBlockSizes = 1e+09
)
}
\arguments{
\item{x}{a \code{\link[SingleCellExperiment]{SingleCellExperiment}}.}

\item{assay}{character string specifying the assay slot to use as input data. Defaults to the 1st available (\code{assayNames(x)[1]}).}

\item{sample_id}{character string specifying which variable to use as sample id}

\item{cluster_id}{character string specifying which variable to use as cluster id}

\item{fun}{a character string.
Specifies the function to use as summary statistic.
Passed to \code{summarizeAssayByGroup2}.}

\item{scale}{logical. Should pseudo-bulks be scaled
with the effective library size & multiplied by 1M?}

\item{verbose}{logical. Should information on progress be reported?}

\item{BPPARAM}{a \code{\link[BiocParallel]{BiocParallelParam}}
object specifying how aggregation should be parallelized.}

\item{checkValues}{logical. Should we check that signal values are positive integers?}

\item{h5adBlockSizes}{set the automatic block size block size (in bytes) for DelayedArray to read an H5AD file.  Larger values use more memory but are faster.}
}
\value{
a \code{\link[SingleCellExperiment]{SingleCellExperiment}}.

 Aggregation parameters (\code{assay, by, fun, scaled}) are stored in  \code{metadata()$agg_pars}, where \code{by = c(cluster_id, sample_id)}.  The number of cells that were aggregated are accessible in \code{int_colData()$n_cells}.
}
\description{
Aggregation of single-cell to pseudobulk data.  Adapted from \code{muscat::aggregateData} and has same syntax and results.  But can be much faster for \code{SingleCellExperiment} backed by H5AD files using on-disk storage.
}
\details{
Adapted from \code{muscat::aggregateData} and has similar syntax and same results.  This is much faster for \code{SingleCellExperiment} backed by H5AD files using \code{DelayedMatrix} because this summarizes counts using \code{\link[DelayedMatrixStats]{DelayedMatrixStats}}.  But this function also includes optmizations for \code{sparseMatrix} used by \code{\link[Seurat]{Seurat}} by using \code{sparseMatrixStats}.

Keeps variables from \code{colData()} that are constant within \code{sample_id}.  For example, sex will be constant for all cells from the same \code{sample_id}, so it is retained as a variable in the pseudobulk result.  But number of expressed genes varies across cells within each \code{sample_id}, so it is dropped from \code{colData()}.  Instead the mean value per cell type is stored in \code{metadata(pb)$aggr_means}, and these can be included in regression formulas downstream.  In that case, the value of the covariates used per sample will depend on the cell type analyzed.
}
\examples{
library(muscat)
library(SingleCellExperiment)

data(example_sce)

# create pseudobulk for each sample and cell cluster
pb <- aggregateToPseudoBulk(example_sce,
  assay = "counts",
  cluster_id = "cluster_id",
  sample_id = "sample_id",
  verbose = FALSE
)

# pseudobulk data from each cell type
# is stored as its own assay
pb

# aggregate by cluster only,
# collapsing all samples into the same pseudobulk
pb2 <- aggregateToPseudoBulk(example_sce, 
 cluster_id = "cluster_id", 
 verbose = FALSE)

pb2
#
}
\references{
Crowell, HL, Soneson, C, Germain, P-L, Calini, D,
Collin, L, Raposo, C, Malhotra, D & Robinson, MD:
Muscat detects subpopulation-specific state transitions from multi-sample multi-condition single-cell transcriptomics data.
\emph{Nature Communications} \strong{11(1):6077} (2020).
doi: \url{https://doi.org/10.1038/s41467-020-19894-4}
}
\author{
Gabriel Hoffman, Helena L Crowell & Mark D Robinson
}
