---
title: "references"
date: "`r doc_date()`"
vignette: >
  % \VignetteIndexEntry{References for metilclock using Bioconductor's ExperimentHub}
  % \VignetteEngine{knitr::rmarkdown}
  % \VignetteEncoding{UTF-8}
output: 
  BiocStyle::html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

# Overview

The methylclockData package is a repository of a few public datasets that
needs the *methylclock* package to estimate chronological and gestational DNA 
methylation (DNAm) age as well as biological age using different methylation clocks.


## Chronological DNAm age (in years)

- **Horvath's clock**: It uses 353 CpGs described in @horvath2013dna. It was trained using 27K and 450K arrays in samples from different tissues. Other three different age-related biomarkers are also computed:
     - **AgeAcDiff** (DNAmAge acceleration difference): Difference between DNAmAge and chronological age.
     - **IEAA** (Intrinsic Epigenetic Age Acceleration): Residuals obtained after regressing DNAmAge and chronological age adjusted by cell counts.  
     - **EEAA** (Extrinsic Epigenetic Age Acceleration): Residuals obtained after regressing DNAmAge and chronological age. This measure was also known as DNAmAge acceleration residual in the first Horvath's paper.
- **Hannum's clock**: It uses 71 CpGs described in @hannum2013genome. It was trained using 450K array in blood samples. Another are-related biomarer is also computed:
     - **AMAR** (Apparent Methylomic Aging Rate): Measure proposed in @hannum2013genome computed as the ratio between DNAm age and the chronological age.
- **BNN**: It uses Horvath's CpGs to train a Bayesian Neural Network (BNN) to predict DNAm age as described in @alfonso2018.
- **Horvath's skin+blood clock (Horvath2)**: Epigenetic clock for skin and blood cells. It uses 391 CpGs described in @horvath2018epigenetic. It was trained using 450K EPIC arrays in skin and blood sampels.
- **PedBE clock**: Epigenetic clock from buccal epithelial swabs. It's intended purpose is buccal samples from individuals aged 0-20 years old. It uses 84 CpGs described in @mcewen2019pedbe. The authors gathered 1,721 genome-wide DNAm profiles from 11 different cohorts with individuals aged 0 to 20 years old. 
- **Wu's clock**: It uses 111 CpGs described in @wu2019dna. It is designed to predict age in children. It was trained using 27K and 450K.
- **BLUP clock**:  It uses 319607 CpGs described in @zhang2019improved. It was trained using 450K and EPIC arrays in blood (13402 samples) and saliva (259 samples). Age predictors based on training sets with various sample sizes using Best Linear Unbiased Prediction (BLUP)  
- **EN clock**:  It uses 514 CpGs described in @zhang2019improved. It was trained using 450K and EPIC arrays in blood (13402 samples) and saliva (259 samples). Age predictors based on training sets with various sample sizes using Elastic Net (EN) 


## Gestational DNAm age (in weeks)

- **Knight's clock**: It uses 148 CpGs described in @knight2016epigenetic. It was trained using 27K and 450K arrays in cord blood samples.
- **Bohlin's clock**: It uses 96 CpGs described in @bohlin2016prediction. It was trained using 450K array in cord blood samples.
- **Mayne's clock**: It uses 62 CpGs described in @mayne2017accelerated. It was trained using 27K and 450K.
- **EPIC clock**: EPIC-based predictor of gestational age. It uses 176 CpGs described in @haftorn2021epic. It was trained using EPIC arrays in cord blood samples.
- **Lee's clocks**: Three different biological clocks described in @lee2019placental are implemented. It was trained for 450K and EPIC arrays in placenta samples.
     - **RPC clock**: Robust placental clock (RPC). It uses 558 CpG sites.
     - **CPC clock**: Control placental clock (CPC). It usses 546 CpG sites.
     - **Refined RPC clock**: Useful for uncomplicated term pregnancies (e.g. gestational age >36 weeks). It uses 396 CpG sites.


The biological DNAm clocks implemented in our package are:

- **Levine's clock** (also know as PhenoAge): It uses 513 CpGs described in @levine2018epigenetic. It was trained using 27K, 450K and EPIC arrays in blood samples.
- **Telomere Length's clock** (TL): It uses 140 CpGs described in @lu2019dna It was trained using 450K and EPIC arrays in blood samples.


# How to load data

In the below example, we show how one can download this dataset from 
ExperimentHub.

```{r get_experimenthub, warning = FALSE, message=FALSE}
library(ExperimentHub)
library(methylclockData)

# Get experimentHub records
eh <- ExperimentHub()

# Get data about methylclockData experimentHub
pData <- query(eh , "methylclockData")

# Get information rows about methylclockData
df <- mcols(pData)
df

# Retrieve data with Hobarth's clock coefficients
pData["EH6071"]


```

We also implemented some functions to easy access to the different datasets , for example, we can access to Hovarths CpGs to train a Bayesian Neural Network with function `get_cpgs_bn` or to `get_coefHannum` for Hannum's clock coefficients

```{r get_Clocks, warning = FALSE, message=FALSE}

#  Hovarths CpGs to train a Bayesian Neural Network
cpgs.bn <- get_cpgs_bn()
head(cpgs.bn)

# Hannum's clock coefficients
coefHannum <- get_coefHannum()
head(coefHannum)

# Hobarth's clock coefficients
coefHorvath <- get_coefHorvath()
head(coefHorvath)

# Knight's clock coefficients
coefKnightGA <- get_coefKnightGA()
head(coefKnightGA)

# Lee's Gestational Age clock coefficients
coefLeeGA <- get_coefLeeGA()
head(coefLeeGA)

# Levine's clock coefficients
coefLevine <- get_coefLevine()
head(coefLevine)

# Mayne's clock coefficients
coefMayneGA <- get_coefMayneGA()
head(coefMayneGA)

# PedBE's clock coefficients
coefPedBE <- get_coefPedBE()
head(coefPedBE)

#  Horvath's skin+blood clock coefficients
coefSkin <- get_coefSkin()
head(coefSkin)

# Telomere Length clock coefficients
coefTL <- get_coefTL()
head(coefTL)

# BLUP clock coefficients
coefBLUP <- get_coefBLUP()
head(coefBLUP)

# EN clock coefficients
coefEN <- get_coefEN()
head(coefEN)

# EPIC clock coefficients
coefEPIC <- get_coefEPIC()
head(coefEPIC)

#  Wu's clock coefficients
Wu <- get_coefWu()
head(Wu)

# # references
references <- get_references()
load(references)

```


For more information in how loading and use of the data, please, refer to [`MethylClock` vignette](https://github.com/isglobal-brge/methylclock) 


# sessionInfo()

```{r sessionInfo}
sessionInfo()
```
