% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/createCompDbPackage.R
\name{createCompDb}
\alias{createCompDb}
\alias{createCompDbPackage}
\alias{make_metadata}
\alias{emptyCompDb}
\title{Create a CompDb database}
\usage{
createCompDb(x, metadata, msms_spectra, path = ".", dbFile = character())

createCompDbPackage(
  x,
  version,
  maintainer,
  author,
  path = ".",
  license = "Artistic-2.0"
)

make_metadata(
  source = character(),
  url = character(),
  source_version = character(),
  source_date = character(),
  organism = NA_character_
)

emptyCompDb(dbFile = character())
}
\arguments{
\item{x}{For \code{createCompDb()}: \code{data.frame} or \code{tbl} with the compound
annotations or \code{character} with the file name(s) from which the compound
annotations should be retrieved. See description for details.

\if{html}{\out{<div class="sourceCode">}}\preformatted{For `createCompDbPackage()`: `character(1)` with the file name of the
`CompDb` SQLite file (created by `createCompDb`).
}\if{html}{\out{</div>}}}

\item{metadata}{For \code{createCompDb()}: \code{data.frame} with metadata
information. See description for details.}

\item{msms_spectra}{For \code{createCompDb()}: \code{data.frame} with MS/MS spectrum
data. See \code{\link[=msms_spectra_hmdb]{msms_spectra_hmdb()}} for the expected format and a function
to import such data from spectrum xml files from HMDB.}

\item{path}{\code{character(1)} with the path to the directory where the database
file or package folder should be written. Defaults to the current
directory.}

\item{dbFile}{\code{character(1)} to optionally provide the name of the SQLite
database file. If not provided (the default) the database name is defined
using information from the provided \code{metadata}.}

\item{version}{For \code{createCompDbPackage()}: \code{character(1)} with the version
of the package (ideally in the format \code{"x.y.z"}).}

\item{maintainer}{For \code{createCompDbPackage()}: \code{character(1)} with the
name and email address of the package maintainer (in the form
\code{"First Last <first.last@provider.com>"}.}

\item{author}{For \code{createCompDbPackage()}: \code{character(1)} with the name
of the package author.}

\item{license}{For \code{createCompDbPackage()}: \code{character(1)} with the
license of the package respectively the originating provider.}

\item{source}{For \code{make_metadata()}: \code{character(1)} with the name of the
resource that provided the compound annotation.}

\item{url}{For \code{make_metadata()}: \code{character(1)} with the url to the original
resource.}

\item{source_version}{For \code{make_metadata()}: \code{character(1)} with the version
of the original resource providing the annotation.}

\item{source_date}{For \code{make_metadata()}: \code{character(1)} with the date of
the resource's release.}

\item{organism}{For \code{make_metadata()}: \code{character(1)} with the name of the
organism. This should be in the format \code{"Hsapiens"} for human,
\code{"Mmusculus"} for mouse etc. Leave to \code{NA} if not applicable.}
}
\value{
For \code{createCompDb()}: a \code{character(1)} with the database name
(invisibly).
}
\description{
\code{CompDb} databases can be created with the \code{createCompDb()} or the
\code{emptyCompDb()} functions, the former creating and initializing (filling)
the database with existing data, the latter creating an empty database that
can be subsequently filled with \code{\link[=insertCompound]{insertCompound()}} or \code{\link[=insertSpectra]{insertSpectra()}}
calls.

\code{emptyCompDb()} requires only the file name of the database that should be
created as input and returns a \code{CompDb} representing the empty database.

\code{createCompDb()} creates a \code{SQLite}-based \code{\link{CompDb}} object/database
from a compound resource provided as a \code{data.frame} or \code{tbl}. Alternatively,
the name(s) of the file(s) from which the annotation should be extracted can
be provided. Supported are SDF files (such as those from the
\emph{Human Metabolome Database} HMDB) that can be read using the
\code{\link[=compound_tbl_sdf]{compound_tbl_sdf()}} or LipidBlast files (see \code{\link[=compound_tbl_lipidblast]{compound_tbl_lipidblast()}}.

An additional \code{data.frame} providing metadata information including the data
source, date, version and organism is mandatory. By default, the function
will define the name of the database based on the provided metadata, but it
is also possible to define this manually with the \code{dbFile} parameter.

Optionally MS/MS (MS2) spectra for compounds can be also stored in the
database. Currently only MS/MS spectra from HMDB are supported. These can
be downloaded in XML format from HMDB (http://www.hmdb.ca), loaded with
the \code{\link[=msms_spectra_hmdb]{msms_spectra_hmdb()}} or \code{\link[=msms_spectra_mona]{msms_spectra_mona()}} function and passed to
the function with the \code{msms_spectra} argument. See \code{\link[=msms_spectra_hmdb]{msms_spectra_hmdb()}} or
\code{\link[=msms_spectra_mona]{msms_spectra_mona()}} for information on the expected columns and format.

Required columns for the \code{data.frame} providing the compound information (
parameter \code{x}) are:
\itemize{
\item \code{"compound_id"}: the ID of the compound. Can be an \code{integer} or
\code{character}. Duplicated IDs are supported (for compatibility reasons), but
not suggested. No missing values allowed.
\item \code{"name"}: the compound's name.
\item \code{"inchi"}: the InChI of the compound.
\item \code{"inchikey"}: the InChI key.
\item \code{"formula"}: the chemical formula.
\item \code{"exactmass"}: the compound's (exact) mass.
\item \code{"synonyms"}: additional synonyms/aliases for the compound. Should be
either a single character or a list of values for each compound.
}

Any additional columns in the provided \code{data.frame} (such as e.g. \code{"smiles"}
providing the compound's SMILES) are also supported and will be inserted into
the database table.

See e.g. \code{\link[=compound_tbl_sdf]{compound_tbl_sdf()}} or \code{\link[=compound_tbl_lipidblast]{compound_tbl_lipidblast()}} for functions
creating such compound tables.

The table containing the MS2 spectra data should have the following format
and columns:
\itemize{
\item \code{"spectrum_id"}: an arbitrary ID for the spectrum. Has to be an \code{integer}.
\item \code{"compound_id"}: the ID of the compound to which the spectrum can be
associated with. This has to be present in the \code{data.frame} defining the
compounds.
\item \code{"polarity"}: the polarity (as an \code{integer}, \code{0} for negative, \code{1} for
positive, \code{NA} for not set).
\item \code{"collision_energy"}: the collision energy.
\item \code{"predicted"}: whether the spectrum was predicted or measured.
\item \code{"splash"}: the SPLASH of the spectrum.
\item \code{"instrument_type"}: the instrument type.
\item \code{"instrument"}: the name of the instrument.
\item \code{"precursor_mz"}: the precursor m/z (as a \code{numeric}).
\item \code{"mz"}: the m/z values.
\item \code{"intensity"}: the intensity values.
}

Only for columns \code{"spectrum_id"}, \code{"compound_id"}, \code{"mz"} and \code{"intensity"}
a value has to be provided in each row of the \code{data.frame}. The others are
optional. Note that the \code{data.frame} can be either in the format as in the
example below (i.e. each row being one spectrum and columns \code{"mz"} and
\code{"intensity"} being of type \code{list} each element being the m/z or intensity
values of one spectrum) or in a \emph{full} form, in which each row represents
one \emph{peak} and all columns except \code{"mz"} and \code{"intensity"} containing
redundant information of each spectrum (hence columns \code{"mz"} and
\code{"intensity"} being of type \code{numeric}).

The metadata \code{data.frame} is supposed to have two columns named \code{"name"} and
\code{"value"} providing the following minimal information as key-value pairs
(see \code{make_metadata} for a utility function to create such a \code{data.frame}):
\itemize{
\item \code{"source"}: the source from which the data was retrieved (e.g. \code{"HMDB"}).
\item \code{"url"}: the url from which the original data was retrieved.
\item \code{"source_version"}: the version from the original data source
(e.g. \code{"v4"}).
\item \code{"source_date"}: the date when the original data source was generated.
\item \code{"organism"}: the organism. Should be in the form \code{"Hsapiens"} or
\code{"Mmusculus"}.
}

\code{createCompDbPackage} creates an R data package with the data from a
\code{\link{CompDb}} object.

\code{make_metadata()} helps generating a metadata \code{data.frame} in
the correct format expected by the \code{createCompDb} function. The function
returns a \code{data.frame}.
}
\details{
Metadata information is also used to create the file name for the database
file. The name starts with \code{"CompDb"}, followed by the organism, the
data source and its version. A compound database file for HMDB version 4
with human metabolites will thus be named: \code{"CompDb.Hsapiens.HMDB.v4"}.

A single \code{CompDb} database is created from multiple SDF files (e.g. for
\emph{PubChem}) if all the file names are provided with parameter \code{x}. Parallel
processing is currently not enabled because SQLite does not support it yet
natively.
}
\examples{

## Read compounds for a HMDB subset
fl <- system.file("sdf/HMDB_sub.sdf.gz", package = "CompoundDb")
cmps <- compound_tbl_sdf(fl)

## Create a metadata data.frame for the compounds.
metad <- data.frame(name = c("source", "url", "source_version",
    "source_date", "organism"), value = c("HMDB", "http://www.hmdb.ca",
    "v4", "2017-08-27", "Hsapiens"))

## Alternatively use the make_metadata helper function
metad <- make_metadata(source = "HMDB", source_version = "v4",
    source_date = "2017-08", organism = "Hsapiens",
    url = "http://www.hmdb.ca")
## Create a SQLite database in the temporary folder
db_f <- createCompDb(cmps, metadata = metad, path = tempdir())

## The database can be loaded and accessed with a CompDb object
db <- CompDb(db_f)
db

## Create a database for HMDB that includes also MS/MS spectrum data
metad2 <- make_metadata(source = "HMDB_with_spectra", source_version = "v4",
    source_date = "2017-08", organism = "Hsapiens",
    url = "http://www.hmdb.ca")

## Import spectrum information from selected MS/MS xml files from HMDB
## that are provided in the package
xml_path <- system.file("xml", package = "CompoundDb")
spctra <- msms_spectra_hmdb(xml_path)

## Create a SQLite database in the temporary folder
db_f2 <- createCompDb(cmps, metadata = metad2, msms_spectra = spctra,
    path = tempdir())

## The database can be loaded and accessed with a CompDb object
db2 <- CompDb(db_f2)
db2

## Does the database contain MS/MS spectrum data?
hasMsMsSpectra(db2)

## Create a database for a ChEBI subset providing the file name of the
## corresponding SDF file
metad <- make_metadata(source = "ChEBI_sub", source_version = "2",
    source_date = NA, organism = "Hsapiens", url = "www.ebi.ac.uk/chebi")
db_f <- createCompDb(system.file("sdf/ChEBI_sub.sdf.gz",
    package = "CompoundDb"), metadata = metad, path = tempdir())
db <- CompDb(db_f)
db

compounds(db)

## connect to the database and query it's tables using RSQlite
library(RSQLite)
con <- dbConnect(dbDriver("SQLite"), db_f)

dbGetQuery(con, "select * from metadata")
dbGetQuery(con, "select * from ms_compound")

## To create a CompDb R-package we could simply use the
## createCompDbPackage function on the SQLite database file name.
}
\seealso{
\code{\link[=compound_tbl_sdf]{compound_tbl_sdf()}} and \code{\link[=compound_tbl_lipidblast]{compound_tbl_lipidblast()}} for functions
to extract compound annotations from files in SDF format, or files from
LipidBlast.

\code{\link[=import_mona_sdf]{import_mona_sdf()}} to import both the compound and spectrum data from a
SDF file from MoNa (Massbank of North America) in one call.

\code{\link[=msms_spectra_hmdb]{msms_spectra_hmdb()}} and \code{\link[=msms_spectra_mona]{msms_spectra_mona()}} for functions to import
MS/MS spectrum data from xml files from HMDB or an SDF file from MoNa.

\code{\link[=CompDb]{CompDb()}} for how to use a compound database.
}
\author{
Johannes Rainer
}
