---
title: "Using ChIPDBData with TFEA.ChIP"
author: "Yosra Berrouayel and Luis del Peso"
date: "`r Sys.Date()`"
output: BiocStyle::html_document
vignette: >
  %\VignetteIndexEntry{Using ChIPDBData}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(TFEA.ChIP)
```

# Overview

The `ChIPDBData` package provides curated ChIP-seq transcription factor target databases designed for use with `r Biocpkg("TFEA.ChIP")`.

Each dataset contains a collection of ChIP-seq experiments (e.g., from ENCODE) along with their associated gene targets. These datasets are structured as **`ChIPDB` list objects**, and can be accessed either manually or via the `getChIPDB()` function.

**Important:** When loading any dataset, make sure it is assigned to an object named `ChIPDB`. This is crucial, as `TFEA.ChIP` looks for a globally defined object called `ChIPDB` and will **not recognize** it under any other name.

# Installation

To install the package, start R and enter:

```{r lib-install, message=FALSE, eval=FALSE}
if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("ChIPDBData")
```

Once `ChIPDBData` is installed, it can be loaded with the following command:

```{r load-lib, message=FALSE, warning=FALSE, cache=FALSE}
library(ChIPDBData)
```

# Available Datasets

The following datasets are currently available in the `ChIPDBData` package:

- ENCODE rE2G (complete)
- ENCODE rE2G subsets filtered by:
    - Score thresholds: 0.25, 0.5, 0.75
    - Depth thresholds: 50, 100, 200, 300
- CREdb
- GeneHancer

These can be accessed via the **ExperimentHub** interface:

```{r load-chipdb-eh}
library(ExperimentHub)

eh <- ExperimentHub()
dbs <- query(eh, "ChIPDBData")
dbs

# Example: Load ENCODE rE2G300d
ChIPDB <- dbs[["EH9854"]]  # IMPORTANT: Assign to 'ChIPDB'
```

Alternatively, you can retrieve datasets programmatically using `getChIPDB()` with any of the following identifiers: "ENCODE_rE2G", "ENCODE_rE2G_25score", "ENCODE_rE2G_50score", "ENCODE_rE2G_75score", "ENCODE_rE2G_50depth", "ENCODE_rE2G_100depth", "ENCODE_rE2G_200depth", "ENCODE_rE2G_300depth", "CREdb" or "GeneHancer".

For example:

```{r load-chipdb}
# Load the ENCODE dataset filtered by depth >= 300
ChIPDB <- getChIPDB("ENCODE_rE2G_300depth")
```

A `ChIPDB` object is a named list with two main components:

1. A **character vector of Entrez Gene IDs**, representing the universe of possible targets.
2. A **named list of ChIP-seq experiments**, where each element is a vector of integer indices pointing to the genes in component 1. Each entry represents the gene targets of a transcription factor in a specific experiment.

Exploring the structure:

```{r explore-db}
# List names of the top-level elements
names(ChIPDB)

# Preview the first few Entrez IDs
ChIPDB[[1]][1:5]

# View names of ChIP-seq experiments
names(ChIPDB[[2]])[1:3]

# Show gene indices for the first experiment
ChIPDB[[2]][[1]][1:5]

# Get actual gene IDs from those indices
ChIPDB[[1]][ ChIPDB[[2]][[1]][1:5] ]
```

# Integration with `TFEA.ChIP`

To perform transcription factor enrichment analysis, start by loading your differential expression data and defining the regulated and control gene sets. Ensure that your ChIP-seq database is loaded and assigned to `ChIPDB`. The `TFEA.ChIP` functions will automatically use this object for analysis.

**Important:** Make sure to load `ChIPDB` after running `library(TFEA.CHIP)`. Otherwise, the package's default database (a limited subset from GeneHancer) will overwrite it.

```{r eval=TRUE}
# Load and preprocess differential expression table
data('hypoxia_DESeq')
hypoxia_table <- preprocessInputData(hypoxia_DESeq)

# Define gene sets
Genes.Upreg <- Select_genes(hypoxia_table, min_LFC = 1)
Genes.Control <- Select_genes(hypoxia_table,
  min_pval = 0.5, max_pval = 1,
  min_LFC = -0.25, max_LFC = 0.25
)

# Run TF enrichment
CM_list <- contingency_matrix(Genes.Upreg, Genes.Control)
results <- getCMstats(CM_list)

# Display results
head(results)
```

# Session Info

```{r session-info}
sessionInfo()
```
