% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/FPR_Simulation.R
\name{FPR_Simulation}
\alias{FPR_Simulation}
\title{FPR Simulation Plot}
\usage{
FPR_Simulation(
  data,
  metadata,
  original_signatures,
  Variable,
  gene_list = NULL,
  number_of_sims = 100,
  title = NULL,
  widthTitle = 30,
  titlesize = 12,
  pointSize = 2,
  labsize = 10,
  mode = c("none", "simple", "medium", "extensive"),
  ColorValues = NULL,
  ncol = NULL,
  nrow = NULL
)
}
\arguments{
\item{data}{A data frame or matrix of gene expression values (genes as rows,
samples as columns).}

\item{metadata}{A data frame containing metadata for the samples (columns of
\code{data}).}

\item{original_signatures}{A named list of gene signatures. Each element can
be either:
\itemize{
\item A vector of gene names (unidirectional), or
\item A data frame with columns \code{"Gene"} and \code{"Signal"} for bidirectional signatures.
}}

\item{Variable}{A column in \code{metadata} indicating the variable of interest
for grouping or regression. This can be categorical or numeric.}

\item{gene_list}{A character vector of gene names from which simulated
signatures are generated by sampling. Default is all genes in \code{data}.}

\item{number_of_sims}{Integer. Number of simulated gene signatures to
generate per original signature.}

\item{title}{Optional title for the overall plot.}

\item{widthTitle}{Integer. Max width for wrapping the title text (default:
30).}

\item{titlesize}{Numeric. Font size for the title text (default: 12).}

\item{pointSize}{Numeric. Size of the points representing simulations
(default: 2).}

\item{labsize}{Numeric. Font size for axis labels (default: 10).}

\item{mode}{A string specifying the level of detail for contrasts. Options
are:
\itemize{
\item \code{"simple"}: Performs the minimal number of pairwise comparisons between
individual group levels (e.g., A - B, A - C). Default.
\item \code{"medium"}: Includes comparisons between one group and the union of all
other groups (e.g., A - (B + C + D)), enabling broader contrasts beyond simple pairs.
\item \code{"extensive"}: Allows for all possible algebraic combinations of group levels
(e.g., (A + B) - (C + D)), supporting flexible and complex contrast definitions.
\item \code{"none"}: Comparing all levels of \code{Variable} (default)
}}

\item{ColorValues}{Named vector of colors for plot points, typically
\code{Original} and \code{Simulated}. If \code{NULL}, default colors are used.}

\item{ncol}{Integer. Number of columns for arranging signature plots in a
grid layout. If \code{NULL}, layout is auto-calculated.}

\item{nrow}{Integer. Number of rows for arranging signature plots in a grid
layout. If \code{NULL}, layout is auto-calculated.}
}
\value{
Invisibly returns a list containing:
\describe{
\item{\code{plot}}{A combined \code{ggplot}using \code{ggarrange}; one violin plot is
generated per signature and contrast.
Observed values are highlighted and compared to the simulated distribution.
Significance (adjusted p-value \eqn{<=} 0.05) is indicated by point shape.}
\item{\code{data}}{A list of data frames, one for each signature, containing
the original and simulated effect sizes.}
}
}
\description{
This function simulates false positive rates (FPR) by generating simulated
gene signatures and comparing the observed effect size values (Cohen's \emph{d} or
\emph{f}) of the original signatures to those from simulated signatures. The
effect size is computed using three scoring methods (\code{ssGSEA}, \code{logmedian},
and \code{ranking}), and the results are visualized as violin plots with overlaid
observed values.
}
\details{
The function supports both categorical and numeric variables:
\itemize{
\item For \strong{categorical variables}, Cohen's \emph{d} is used and contrasts are defined
by the \code{mode} parameter, if \code{mode!=none}.
\item For \strong{numeric variables}, Cohen's \emph{f} is used to quantify associations
through linear modeling.
}

For each original gene signature, a number of simulated signatures are
created by sampling genes from \code{gene_list}. Each simulated signature is
scored using three methods, and its effect size is computed relative to the
variable of interest. The resulting distributions are shown as violins,
overlaid with the observed value from the original signature. A red dashed
line marks the 95th percentile of the simulated distribution per method.

The function internally uses \code{CohenD_allConditions()} and
\code{CohenF_allConditions()} depending on variable type.
}
\examples{
# Simulate gene expression matrix (genes as rows, samples as columns)
set.seed(444)
expr <- as.data.frame(matrix(abs(rnorm(60)), nrow = 6, ncol = 10))
rownames(expr) <- paste0("Gene", 1:6)
colnames(expr) <- paste0("Sample", 1:10)

# Simulate sample metadata with a categorical variable
metadata <- data.frame(
  sample = colnames(expr),
  Condition = rep(c("A", "B"), each = 5),
  stringsAsFactors = FALSE
)

# Define two gene signatures (as character vectors)
signatures <- list(
  Sig1 = c("Gene1", "Gene2", "Gene3"),
  Sig2 = c("Gene4", "Gene5")
)

# Run FPR simulation (with fewer sims for speed in example)
FPR_Simulation(
  data = expr,
  metadata = metadata,
  original_signatures = signatures,
  Variable = "Condition",
  number_of_sims = 20,
  title = "FPR Simulation Example",
  pointSize = 3
)

}
