% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/bbb-step1-fit.R
\name{fitOptimal}
\alias{fitOptimal}
\title{Cross validation and fit of asmbPLSDA}
\usage{
fitOptimal(
  object,
  parallel = FALSE,
  measure = "B_accuracy",
  Method = NULL,
  expected_measure_increase = 0.005,
  maxiter = 100,
  global_significance_full = FALSE,
  CIP.GIP_significance_full = FALSE,
  npermut = 100,
  nbObsPermut = NULL,
  type = "jackknife",
  nsubsampling = 100,
  ...
)
}
\arguments{
\item{object}{A superpathway input list to fit optimal asmbPLSDA.}

\item{parallel}{A boolean indicating whether to parallelize (\code{TRUE})
for LOOCV on quantile combination or not (\code{FALSE}). Note this option is only
available for LOOCV and not KCV. Default is \code{FALSE}.}

\item{measure}{Accuracy measure to be used to select optimal asmbPLSDA model.
Default is F1 measure. Options are: F1, accuracy, B_accuracy, precision
and recall.}

\item{Method}{Decision rule used for prediction. For binary outcome
\code{fixed_cutoff} (default), \code{Euclidean_distance_X}, and
\code{Mahalanobis_distance_X}. For categorical otcome with more than 2 levels,
the methods include \code{Max_Y} (default), \code{Euclidean_distance_X},
\code{Mahalanobis_distance_X}, \code{Euclidean_distance_Y}, and
\code{PCA_Mahalanobis_distance_Y}. If \code{NULL} the default method is used for the
respective outcome binary.}

\item{expected_measure_increase}{A double indicating the measure you expect
to decrease by percent after including one more PLS component, this will
affect the selection of optimal number of PLS components. If \code{NULL} the
default is 0.005 (0.5\%).}

\item{maxiter}{An integer indicating the maximum number of iterations.
If \code{NULL} the default is 100.}

\item{global_significance_full}{A boolean indicating whether to return a list
with information of each permutation for the global
significance test of asmbPLSDA. By default \code{FALSE}. Note that if the number
of permutations that is set is large, storing this information can
be a burden on memory.}

\item{CIP.GIP_significance_full}{A boolean indicating whether to return a
list with the observed and null distributions of CIP and GIP or only the
pvalue and adjusted pvalue. By default \code{FALSE}. Note that if the number of
permutations that is set is large, storing this information can be a burden
on memory.}

\item{npermut}{Number of permutations for the tests. By default 100.
Parameter passed onto \link{permut_asmbplsda} and \link{CIP_GIP_test}.}

\item{nbObsPermut}{An integer indicating the number of samples to permute
in each permutation. By default \code{NULL}. If \code{NULL} the number of samples to
permute at each permutation is randomly chosen (for each permutation).
Parameter passed onto \link{permut_asmbplsda}.}

\item{type}{Either \code{jackknife} or \code{subsampling}. If \code{jackknife} then the CIP
and GIP observed distribution is
generated by a jackknife procedure. If \code{subsampling} the CIP and GIP observed
distribution is generated by subsampling the number of samples without
replacement, each subsample is guaranteed to contain at least 2 samples per
class. If a LOOCV was performed or one has small sample size it is
recommended to select \code{jackknife}, otherwise select \code{subsampling}.
Passed onto \link{CIP_GIP_test}.}

\item{nsubsampling}{Number of subsamples to generate CIP and GIP observed
distributions. By default 100. Passed onto \link{CIP_GIP_test}.}

\item{...}{Other parameters to be passed onto \link{wilcox_CIP_GIP}, wilcox
test of GIP statistical tests}
}
\value{
A superpathway fit model list object with; a
superpathway input list object used for CV and model fit;
a hyperparameters list object with the hyperparameters used to fit
the optimal model (includes optimal quantiles and PLS components from the
CV step); a list with the fitted model information including: predictor
and response matrices, observed gene sets, from \code{matrixToBlock},
and asmbPLSDA output; a list with the validaton metrics of fitted model.
}
\description{
Performs Cross Validation of the provided superpathway input, fits the
optimal model and computes its validation metrics. The Cross Validation can
either be Leave One-Out Cross Validation (LOOCV) or K-Fold Cross Validation
(KCV). A LOOCV is performed if the number of folds was set to 1 or if the
number of samples per class is less than 3 for any class. A K-Fold Cross
Validation (KCV) is performed if the number of folds is greater or equal
than 3 and the number of samples per class is always greater than the number
of folds. If the number of samples is low for some of the classes LOOCV is
recommended. If KCV is performed, missing values are automatically imputed
in the K-CV process. The training set is imputed via missMDA::imputeMFA(),
and an FactoMineR::MFA() is trained on the imputed training set from which
we extract the mean of each gene and the estimated loadings. We then estimate
the validation set by projecting the samples onto MFA space of the training
set. Gene whose variance is 0 are excluded from the imputation, if a gene
has null variance and full of 0 values, the NA were imputed to 0.
}
\examples{
# fitOptimal with jackknife for CIP/GIP statistics and 10 permutations
# for the global significance test of the optimal model
file <- system.file("extdata", "example_superpathway_input.rda",
package = "singIST")
load(file)
data <- example_superpathway_input
fitOptimal(data, npermut = 10, type = "jackknife")
# fitOptimal with subsampling for CIP/GIP statistics with
# 10 subsamples and 50 permutations for the global significance test of the
# optimal model
fitOptimal(data, npermut = 50, type = "subsampling",
nsubsampling = 10)
}
