\name{messinaSurv}
\alias{messinaSurv}
\title{Find optimal prognostic features using the Messina algorithm}
\usage{
messinaSurv(x, y, obj_min, obj_func, min_group_frac = 0.1, f_train = 0.8,
  n_boot = 50, seed = NULL, parallel = NULL, silent = FALSE)
}
\arguments{
  \item{x}{feature expression values, either supplied as an
  ExpressionSet, or as an object that can be converted to a
  matrix by as.matrix.  In the latter case, features should
  be in rows and samples in columns, with feature names
  taken from the rows of the object.}

  \item{y}{a Surv object containing survival times and
  censoring status for each}

  \item{obj_min}{the minimum acceptable value of the
  objective metric.  The metric used is specified by the
  parameter obj_func.}

  \item{obj_func}{the metric function that measures the
  difference in survival between patients with feature
  values above, and below, the threshold.  Valid values are
  "tau", "reltau", or "coxcoef"; see details for more
  information.}

  \item{min_group_frac}{the size of the smallest sample
  group that is allowed to be generated by thresholding, as
  a fraction of the total sample.  The default value of 0.1
  means that no thresholds will be selected that result in
  a sample split yielding a group of smaller than 10% of
  the samples.  A modest value of this parameter increases
  the stability of the "reltau" and "coxcoef" objectives,
  which tend to become unstable as the number of samples in
  a group becomes very low; see details.}

  \item{f_train}{the fraction of samples to be used in the
  training splits of the bootstrap rounds.}

  \item{n_boot}{the number of bootstrap rounds to use.}

  \item{seed}{an optional random seed for the analysis.  If
  NULL, the R PRNG is used as-is.}

  \item{parallel}{should calculations be parallelized using
  the doMC framework?  If NULL, parallel mode is used if
  the doMC library is loaded, and more than one core has
  been registered with registerDoMC().  Note that no
  progress bar is displayed in parallel mode.}

  \item{silent}{be completely silent (except for error and
  warning messages)?}
}
\value{
an object of class "MessinaSurvResult" containing the
results of the analysis.
}
\description{
Run the MessinaSurv algorithm to find features (eg. genes)
that can define groups of patients with very different
survival times.
}
\details{
The MessinaSurv algorithm aims to identify features for
which patients with high signal and patients with low
signal have very different survival outcomes.  This is
achieved by definining an objective function which assigns
a numerical value to how strongly the survival in two
groups of patients differs, then assessing the value of
this objective at different signal levels of each feature.
Those features for which, at a given signal level, the
objective function is consistently above a user-supplied
minimum level, are selected by MessinaSurv as being
single-feature survival predictors.

MessinaSurv has applications as an algorithm to identify
features that are survival-related, as well as a principled
method to identify threshold signal values to separate a
cohort into poor- and good-prognosis subgroups.  It can
also be used as a feature filter, selecting and
discretising survival-related features before they are
input into a multivariate predictor.
}
\section{Objective functions}{
  MessinaSurv uses the value of its objective function as a
  measure of the strength of the difference in survival of
  the two patient groups defined by the threshold.  Three
  objective functions are currently defined: \describe{
  \item{"coxcoef"}{The coefficient of a Cox proportional
  hazards fit to the model Surv ~ I(x > T), where x is the
  feature signal level, and T is the threshold being
  tested.  Range is (-inf, inf), with a no-information
  value of 0; positive values indicate that the subgroup
  defined by signal above the threshold fails sooner.}
  \item{"tau"}{Kendall's tau for survival data, defined as
  (concordant + tied/2) / (concordant + discordant + tied),
  where concordant is the number of concordant
  group/survival pairs, discordant is the number of
  discordant group/survival pairs, and tied is the total
  number of tied pairs, counting both group and survival
  ties.  Concordance is calculated expecting that samples
  with signal exceeding the threshold will fail sooner.
  Range is [0, 1], with a no-information value of 0.5.
  Note that the ties terms naturally penalize very high or
  low thresholds, and so this objective is inappropriate if
  somewhat unbalanced subgroups are expected to be present
  in the data.} \item{"reltau"}{tau, normalized to remove
  the ties penalty.  Defined as agree / (agree + disagree).
  Range is [0, 1], with a no-information value of 0.5.
  Although the ties penalty of tau is removed, and this
  method is thus suitable for finding unbalanced subgroups,
  it is now unstable at extreme threshold values (as in
  these cases, agree + disagree -> 0).  For this reason,
  min_group_frac must be set to a modest value when using
  "reltau", to preserve stability. } } Methods "coxcoef"
  and "reltau" show instability for very high and low
  threshold values, and so should be used with an
  appropriate value of min_group_frac for stable fits.
  Method "tau" is stable to extreme threshold values, and
  therefore will tolerate min_group_frac = 0, however note
  that "tau" naturally penalizes small subgroups, and is
  therefore a poor choice unless you wish to find
  approximately equal-sized subgroups.
}

\section{Minimum group fraction}{
  The parameter min_group_frac limits the size of the
  smallest subgroups that messinaSurv can select.  As the
  groups become smaller, the "reltau" and "coxcoef"
  objective functions become unstable, and can generate
  spurious results.  These are seen on the diagnostics
  produced by the messina plot functions as very high
  objective values at very low and high threshold values.
  To control these results, set min_group_frac to a high
  enough value that the objective functions reliably fit.
  Generally, max(0.1, 10/N), where N is the total number of
  patients, is sufficient.  Keep in mind that setting this
  parameter too high will limit messinaSurv's ability to
  identify small subsets of patients with dramatically
  different survival from the rest: the smallest subset
  that will be reliably identified is min_group_frac of
  patients.
}
\examples{
## Load a subset of the TCGA renal clear cell carcinoma data
## as an example.
data(tcga_kirc_example)

## Run the messinaSurv analysis on these data.  Use a tau
## objective, with a minimum performance of 0.6.  Note that
## messinaSurv analyses are very computationally-intensive,
## so in actual use multicore use with doMC and parallel = TRUE
## is strongly recommended.
fit = messinaSurv(kirc.exprs, kirc.surv, obj_func = "tau", obj_min = 0.6)

fit
plot(fit)
}
\author{
Mark Pinese \email{m.pinese@garvan.org.au}
}
\seealso{
\code{\link{MessinaSurvResult-class}}

\code{\link[Biobase]{ExpressionSet}}

\code{\link{messina}}

\code{\link{messinaDE}}
}

