% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/Sanity.R
\name{Sanity}
\alias{Sanity}
\alias{Sanity,ANY-method}
\alias{Sanity,SummarizedExperiment-method}
\alias{Sanity,SingleCellExperiment-method}
\title{Estimate gene-level expression using the Sanity model}
\usage{
Sanity(x, ...)

\S4method{Sanity}{ANY}(
  x,
  size.factors = NULL,
  vmin = 0.001,
  vmax = 50,
  nbin = 160L,
  a = 1,
  b = 0,
  BPPARAM = bpparam()
)

\S4method{Sanity}{SummarizedExperiment}(x, ..., assay.type = "counts", name = "logcounts", subset.row = NULL)

\S4method{Sanity}{SingleCellExperiment}(x, size.factors = sizeFactors(x), ...)
}
\arguments{
\item{x}{A numeric matrix of counts where features are rows and columns are cells.

Alternatively, a \linkS4class{SummarizedExperiment} or a
\linkS4class{SingleCellExperiment} containing such counts.}

\item{...}{For the generic, further arguments to pass to each method.

For the \code{SummarizedExperiment} method, further arguments to pass to the \code{ANY}
method.

For the \code{SingleCellExperiment} method, further arguments to pass to the
\code{SummarizedExperiment} method.}

\item{size.factors}{A numeric vector of cell-specific size factors.
Alternatively \code{NULL}, in which case the size factors are computed from \code{x}.}

\item{vmin}{The minimum value for the gene-level variance (must be > 0).}

\item{vmax}{The maximum value for the gene-level variance.}

\item{nbin}{Number of variance bins to use.}

\item{a, b}{Gamma prior parameter (see Details).}

\item{BPPARAM}{A \linkS4class{BiocParallelParam} object specifying whether
the calculations should be parallelized.}

\item{assay.type}{A string specifying the assay of \code{x} containing the count matrix.}

\item{name}{String containing an assay name for storing the output normalized values.}

\item{subset.row}{A vector specifying the subset of rows of \code{x} to process.}
}
\value{
For \code{matrix}-like object it returns a named list with the following elements
(symbols as defined in the Supplementary Text of the publication):
\describe{
\item{mu}{Posterior mean of log expression across cells \eqn{\mu_g}.}
\item{var_mu}{Posterior variance of the mean expression \eqn{\left(\delta \mu_g\right)^2}.}
\item{var}{Posterior variance of expression across cells \eqn{\langle v_g \rangle}.}
\item{delta}{Vector of log fold-changes for each cell relative to \eqn{\delta_{gc}}.}
\item{var_delta}{Posterior variance of the cell-level fold-changes \eqn{\epsilon_{gc}^2}.}
\item{lik}{Normalized likelihood across the evaluated variance grid \eqn{P\left(v_g \mid n_g \right)} for diagnostics.}
}

If called on a \linkS4class{SingleCellExperiment} or \linkS4class{SummarizedExperiment}
it appends the following columns to the \code{rowData} slot:
\describe{
\item{sanity_log_activity_mean}{\code{mu}}
\item{sanity_log_activity_mean_sd}{\code{sqrt(var_mu)}}
\item{sanity_activity_sd}{\code{sqrt(var)}}
}
and appends the following assays (assuming \code{name = "logcounts"}):
\describe{
\item{assay(x, "logcounts")}{\code{mu + delta}}
\item{assay(x, "logcounts_sd")}{\code{sqrt(var_mu + var_delta)}}
}
}
\description{
This function provides a user-friendly interface to the Sanity model for
gene expression analysis.
}
\details{
The method models gene activity using a Bayesian framework, assuming
a Gamma prior on expression and integrating over cell-level variability.
It returns posterior estimates for mean expression (\code{mu}), cell-specific
deviations (\code{delta}), and their variances, as well as expression variance
(\code{var}). \emph{Expected} log-normalized counts are computed by combining mean
expression and cell-specific log-fold changes. The \emph{standard deviation} of
log-counts is computed by summing the variances of the components.

If no \code{size.factors} are provided, they are assumed all equal so that all
cells have the same library size \code{mean(colSums(x))}.
\subsection{Gamma Prior:}{

The model adopts a Bayesian framework by placing a Gamma prior \code{Gamma(a, b)}
over the gene activity, where \code{a} is the shape and \code{b} the rate parameter,
respectively. This allows for flexible regularization and uncertainty
modeling. The posterior likelihood is estimated by integrating over possible
values of the variance in expression.

Intuitively:
\itemize{
\item \code{a} acts as a pseudo-count added to the total count of the gene.
\item \code{b} acts as a pseudo-count penalizing deviations from the average.
expression — i.e., it regularizes the total number of UMIs that differ from
the expected value.
}

Setting \code{a = 1} and \code{b = 0} corresponds to an uninformative (uniform) prior,
which was used in the original Sanity model publication.
}
}
\examples{
library(SingleCellExperiment)

sce <- simulate_independent_cells(N_cell = 500, N_gene = 100)

# Standard Sanity normalization
sce_norm <- Sanity(sce)
logcounts(sce_norm)[1:5,1:5]

# Using size factors
sf <- colSums(counts(sce))
sizeFactors(sce) <- sf / mean(sf)
sce_norm2 <- Sanity(sce)
logcounts(sce_norm2)[1:5,1:5]

}
\references{
Breda, J., Zavolan, M., & van Nimwegen, E. (2021).
Bayesian inference of gene expression states from single-cell RNA-seq data.
\emph{Nature Biotechnology}, 39, 1008–1016. \url{https://doi.org/10.1038/s41587-021-00875-x}
}
