% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/Spectra-functions.R, R/Spectra.R
\name{concatenateSpectra}
\alias{concatenateSpectra}
\alias{combineSpectra}
\alias{joinSpectraData}
\alias{split}
\alias{cbind2}
\alias{c,Spectra-method}
\alias{cbind2,Spectra,dataframeOrDataFrameOrmatrix-method}
\alias{split,Spectra,ANY-method}
\title{Merging, aggregating and splitting Spectra}
\usage{
concatenateSpectra(x, ...)

combineSpectra(
  x,
  f = x$dataStorage,
  p = x$dataStorage,
  FUN = combinePeaksData,
  ...,
  BPPARAM = bpparam()
)

joinSpectraData(x, y, by.x = "spectrumId", by.y, suffix.y = ".y")

\S4method{c}{Spectra}(x, ...)

\S4method{cbind2}{Spectra,dataframeOrDataFrameOrmatrix}(x, y, ...)

\S4method{split}{Spectra,ANY}(x, f, drop = FALSE, ...)
}
\arguments{
\item{x}{A \code{Spectra} object.}

\item{...}{Additional arguments.}

\item{f}{For \code{split()}: factor defining how to split \code{x}. See \code{\link[base:split]{base::split()}}
for details.
For \code{combineSpectra()}: \code{factor} defining the grouping of the spectra
that should be combined. Defaults to \code{x$dataStorage}.}

\item{p}{For \code{combineSpectra()}: \code{factor} defining how to split the input
\code{Spectra} for parallel processing. Defaults to \code{x$dataStorage}, i.e.,
depending on the used backend, per-file parallel processing will be
performed.}

\item{FUN}{For \code{combineSpectra()}: function to combine the (peak matrices)
of the spectra. Defaults to \code{\link[=combinePeaksData]{combinePeaksData()}}.}

\item{BPPARAM}{Parallel setup configuration. See \code{\link[BiocParallel:register]{BiocParallel::bpparam()}}
for more information. This is passed directly to the
\code{\link[=backendInitialize]{backendInitialize()}} method of the \linkS4class{MsBackend}.}

\item{y}{For \code{joinSpectraData()}: \code{DataFrame} with the spectra variables
to join/add. For \code{cbind2()}: a \code{data.frame}, \code{DataFrame} or
\code{matrix}. The number of rows and their order has to match the
number of spectra in \code{x}, respectively their order.}

\item{by.x}{A \code{character(1)} specifying the spectra variable used
for merging. Default is \code{"spectrumId"}.}

\item{by.y}{A \code{character(1)} specifying the column used for
merging. Set to \code{by.x} if missing.}

\item{suffix.y}{A \code{character(1)} specifying the suffix to be used
for making the names of columns in the merged spectra variables
unique. This suffix will be used to amend \code{names(y)}, while
\code{spectraVariables(x)} will remain unchanged.}

\item{drop}{For \code{split()}: not considered.}
}
\description{
Various functions are availabe to combine, aggregate or split data from one
of more \code{Spectra} objects. These are:
\itemize{
\item \code{c()} and \code{concatenateSpectra()}: combines several \code{Spectra} objects into
a single object. The resulting \code{Spectra} contains all data from all
individual \code{Spectra}, i.e. the union of all their spectra variables.
Concatenation will fail if the processing queue of any of the \code{Spectra}
objects is not empty or if different backends are used for the \code{Spectra}
objects. In such cases it is suggested to first change the backends of
all \code{Spectra} to the same type of backend (using the \code{\link[=setBackend]{setBackend()}}
function and to eventually (if needed) apply the processing queue using
the \code{\link[=applyProcessing]{applyProcessing()}} function.
\item \code{cbind2()}: Appends multiple spectra variables from a \code{data.frame},
\code{DataFrame} or \code{matrix} to the \code{Spectra} object at once. The order of
the values (rows) in \code{y} has to match the order of spectra in \code{x}. The
function does not allow to replace existing spectra variables. \code{cbind2()}
returns a \code{Spectra} object with the appended spectra variables. For a more
controlled way of adding spectra variables, see the \code{joinSpectraData()}
function.
\item \code{combineSpectra()}: combines sets of spectra (defined with parameter \code{f})
into a single spectrum per set aggregating their MS data (i.e. their
\emph{peaks data} matrices with the \emph{m/z} and intensity values of their
mass peaks). The spectra variable values of the first spectrum per set
are reported for the combined spectrum. The peak matrices of the spectra
per set are combined using the function specified with parameter \code{FUN}
which uses by default the \code{\link[=combinePeaksData]{combinePeaksData()}} function. See the
documentation of \code{\link[=combinePeaksData]{combinePeaksData()}} for details on the aggregation of
the peak data and the package vignette for examples.
The sets of spectra can be specified with parameter \code{f} which is expected
to be a \code{factor} or \code{vector} of length equal to the length of the
\code{Spectra} specifying to which set a spectrum belongs to. The function
returns a \code{Spectra} of length equal to the unique levels of \code{f}. The
optional parameter \code{p} allows to define how the \code{Spectra} should be
split for potential parallel processing. The default is
\code{p = x$dataStorage} and hence a per storage file parallel processing is
applied for \code{Spectra} with on disk data representations (such as the
\code{\link[=MsBackendMzR]{MsBackendMzR()}}). This also prevents that spectra from different data
files/samples are combined (eventually use e.g. \code{p = x$dataOrigin} or any
other spectra variables defining the originating samples for a spectrum).
Before combining the peaks data, all eventual present processing steps are
applied (by calling \code{\link[=applyProcessing]{applyProcessing()}} on the \code{Spectra}). This function
will replace the original \emph{m/z} and intensity values of a \code{Spectra} hence
it can not be called on a \code{Spectra} with a \emph{read-only} backend. In such
cases, the backend should be changed to a \emph{writeable} backend before
using the \code{\link[=setBackend]{setBackend()}} function (to e.g. a \code{\link[=MsBackendMemory]{MsBackendMemory()}} backend).
\item \code{joinSpectraData()}: Individual spectra variables can be directly
added with the \verb{$<-} or \verb{[[<-} syntax. The \code{joinSpectraData()}
function allows to merge a \code{DataFrame} to the existing spectra
data of a \code{Spectra}. This function diverges from the \code{\link[=merge]{merge()}} method in
two main ways:
\itemize{
\item The \code{by.x} and \code{by.y} column names must be of length 1.
\item If variable names are shared in \code{x} and \code{y}, the spectra
variables of \code{x} are not modified. It's only the \code{y}
variables that are appended with the suffix defined in
\code{suffix.y}. This is to avoid modifying any core spectra
variables that would lead to an invalid object.
\item Duplicated Spectra keys (i.e. \code{x[[by.x]]}) are not
allowed. Duplicated keys in the \code{DataFrame} (i.e \code{y[[by.y]]})
throw a warning and only the last occurrence is kept. These
should be explored and ideally be removed using for
\code{QFeatures::reduceDataFrame()}, \code{PMS::reducePSMs()} or similar
functions.
For a more general function that allows to append \code{data.frame},
\code{DataFrame} and \code{matrix} see \code{cbind2()}.
}
\item \code{split()}: splits the \code{Spectra} object based on parameter \code{f} into a \code{list}
of \code{Spectra} objects.
}
}
\examples{

## Create a Spectra providing a `DataFrame` containing a MS data.

spd <- DataFrame(msLevel = c(1L, 2L), rtime = c(1.1, 1.2))
spd$mz <- list(c(100, 103.2, 104.3, 106.5), c(45.6, 120.4, 190.2))
spd$intensity <- list(c(200, 400, 34.2, 17), c(12.3, 15.2, 6.8))

s <- Spectra(spd)
s

## Create a second Spectra from mzML files and use the `MsBackendMzR`
## on-disk backend.
sciex_file <- dir(system.file("sciex", package = "msdata"),
    full.names = TRUE)
sciex <- Spectra(sciex_file, backend = MsBackendMzR())
sciex

## Subset to the first 100 spectra to reduce running time of the examples
sciex <- sciex[1:100]


##  --------  COMBINE SPECTRA  --------

## Combining the `Spectra` object `s` with the MS data from `sciex`.
## Calling directly `c(s, sciex)` would result in an error because
## both backends use a different backend. We thus have to first change
## the backends to the same backend. We change the backend of the `sciex`
## `Spectra` to a `MsBackendMemory`, the backend used by `s`.

sciex <- setBackend(sciex, MsBackendMemory())

## Combine the two `Spectra`
all <- c(s, sciex)
all

## The new `Spectra` objects contains the union of spectra variables from
## both:
spectraVariables(all)

## The spectra variables that were not present in `s`:
setdiff(spectraVariables(all), spectraVariables(s))

## The values for these were filled with missing values for spectra from
## `s`:
all$peaksCount |> head()


##  --------  AGGREGATE SPECTRA  --------

## Sets of spectra can be combined into a single, representative spectrum
## per set using `combineSpectra()`. This aggregates the peaks data (i.e.
## the spectra's m/z and intensity values) while using the values for all
## spectra variables from the first spectrum per set. Below we define the
## sets as all spectra measured in the *same second*, i.e. rounding their
## retention time to the next closer integer value.
f <- round(rtime(sciex))
head(f)

cmp <- combineSpectra(sciex, f = f)

## The length of `cmp` is now equal to the length of unique levels in `f`:
length(cmp)

## The spectra variable value from the first spectrum per set is used in
## the representative/combined spectrum:
cmp$rtime

## The peaks data was aggregated: the number of mass peaks of the first six
## spectra from the original `Spectra`:
lengths(sciex) |> head()

## and for the first aggreagated spectra:
lengths(cmp) |> head()

## The default peaks data aggregation method joins all mass peaks. See
## documentation of the `combinePeaksData()` function for more options.


##  --------  SPLITTING DATA  --------

## A `Spectra` can be split into a `list` of `Spectra` objects using the
## `split()` function defining the sets into which the `Spectra` should
## be splitted into with parameter `f`.
sciex_split <- split(sciex, f)

length(sciex_split)
sciex_split |> head()


##  --------  ADDING SPECTRA DATA  --------

## Adding new spectra variables
sciex1 <- filterDataOrigin(sciex, dataOrigin(sciex)[1])
spv <- DataFrame(spectrumId = sciex1$spectrumId[3:12], ## used for merging
                 var1 = rnorm(10),
                 var2 = sample(letters, 10))
spv

sciex2 <- joinSpectraData(sciex1, spv, by.y = "spectrumId")

spectraVariables(sciex2)
spectraData(sciex2)[1:13, c("spectrumId", "var1", "var2")]

## Append new spectra variables with cbind2()
df <- data.frame(cola = seq_len(length(sciex1)), colb = "b")
data_append <- cbind2(sciex1, df)
}
\seealso{
\itemize{
\item \code{\link[=combinePeaks]{combinePeaks()}} for functions to aggregate mass peaks data.
\item \link{Spectra} for a general description of the \code{Spectra} object.
}
}
\author{
Sebastian Gibb, Johannes Rainer, Laurent Gatto
}
