% \VignetteIndexEntry{MethylAid-summarized data on 2800 Illumina 450k array samples}  
% \VignettePackage{MethylAidData}
% \VignetteEngine{knitr::knitr}

\documentclass[12pt]{article}
\usepackage[english]{babel} %%remove tildes e.q. in bibliography

<<style, eval=TRUE, echo=FALSE, results="asis">>=
BiocStyle::latex()
@

\bioctitle[MethylAid-summarized data]{MethylAid-summarized data for Illumina 450k (N=2800) and EPIC (N=2620) arrays}

\author[1]{Davy Cats}
\author[2]{Tyler J Gorrie-Stone}
\author[1]{Bastiaan T Heijmans}
\author[3,4]{John W Holloway}
\author{BIOS Consortium}
\author[1]{Maarten van Iterson}
\author[4]{Faisal I. Rezwan}
\author[2]{Leonard Schalkwyk}

\affil[1]{Dept. of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands}
\affil[2]{School of Biological Sciences, University of Essex, Essex, UK.}
\affil[3]{Human Development and Health, Faculty of Medicine, University of Southampton, UK}
\affil[4]{Clinical and Experimental Sciences, Faculty of Medicine, University of Southampton, UK}


\begin{document}

\maketitle

\section{Introduction}
\Biocpkg{MethylAidData} contains \Biocpkg{MethylAid}-summarized data
on 2800 Illumina 450k array samples and 2620 Illumina EPIC/850k array
that can be used as reference when processing newly generated
methylation array data using \Biocpkg{MethylAid}.

The data on 450k arrays is based on a subset from a
large-scale multiple omics study conducted by several Dutch Biobanks;
the BIOS consortium
(http://www.bbmri.nl/en-gb/activities/rainbow-projects/bios)
\cite{Bonder2017}. The raw Illumina 450k array data, idat-files, are
available through the EGA archive
(https://ega-archive.org/dacs/EGAC00001000277).

MethylAid-summarized data for EPIC methylation arrays stems from
studies led by the University of Southampton (N=1434)
\cite{Burney1994} and the University of Essex (N=1186).

The summarization performed by \Biocpkg{MethylAid}
entails the following for each sample:
\begin{enumerate}
\item calculation of the median Methylated and Unmethylated
  intensities
\item extraction of all quality control probe intensities
\item construction of quality control metrics e.g. sample-dependent,
  sample-independent and detection p-values
\item storing everything efficiently to allow fast rendering of the
  various quality control plots provided by \Biocpkg{MethylAid},  
\end{enumerate}
see van Iterson \emph{et al.}\cite{Iterson2014} for detailed
description of \Biocpkg{MethylAid}.

\section{Preparation of the data}
Using raw idat-files (e.g. EGA accession number EGAC00001000277 after
approval by Data Access Committee). Once the raw idat-files have been
downloaded and a targets file is constructed, \Biocpkg{MethylAid} can
be used to summarize the data and perform quality control using the
interactive \Rpackage{shiny}\cite{shiny} application.

Data sets of this size are preferably summarized in parallel and
batches to overcome long run times or memory
issues. \Biocpkg{MethylAid} provides several options to do this using
the \Biocpkg{BiocParallel}-package\cite{biocparallel}. For example, if
multiple cores are available these could be used like this:

<<exampleDataLarge, eval=FALSE>>=
library(MethylAid)
targets ##constructed from EGA
BPPARAM <- MulticoreParam(workers = 8, verbose=TRUE)
summarize(targets, batchSize = 100, BPPARAM = BPPARAM, file="exampleDataLarge")
@

Another option would be thus use a cluster, see the vignette of
\Biocpkg{MethylAid} how to set this up.

\section{Using MethylAidData}
The summarized data contained in \Biocpkg{MethylAidData} can be used in
two ways, 1) to explore a large data set using \Biocpkg{MethylAid}
and 2) use this data as a background data set on top of own
data. Since version 1.1.4, \Biocpkg{MethylAid} has the functionality
to show as background data set in the filter control plots. As such it
can be used as a reference data set and can give guidance to when
removing outlying samples. Furthermore, the data gives confirmation of
the default thresholds used to determine outlying samples. 

Additionally, since \Biocpkg{MethylAid}(1.1.4) functionality is added
to construct your own background data and several
summarizedData-objects can be merged to give one larger
summarizedData-object to use as your own reference or to determine
filter thresholds, for example for hydroxymethylation data for which
there are currently no thresholds available.

\bibliography{MethylAidData}

\end{document}
