---
title: "The SpatialDatasets package"
author:
- name: Nicholas Robertson
  affiliation:
  - Sydney Precision Data Science Centre, University of Sydney, Australia
  - School of Mathematics and Statistics, University of Sydney, Australia
- name: Farhan Ameen
  affiliation:
  - Sydney Precision Data Science Centre, University of Sydney, Australia
  - School of Mathematics and Statistics, University of Sydney, Australia
- name: Alex Qin
  affiliation:
  - Sydney Precision Data Science Centre, University of Sydney, Australia
  - School of Mathematics and Statistics, University of Sydney, Australia
  - Westmead Institute for Medical Research, University of Sydney, Australia
- name: Ellis Patrick
  affiliation:
  - Sydney Precision Data Science Centre, University of Sydney, Australia
  - School of Mathematics and Statistics, University of Sydney, Australia
  - Westmead Institute for Medical Research, University of Sydney, Australia
package: SpatialDatasets
output: 
  BiocStyle::html_document
vignette: >
  %\VignetteIndexEntry{The SpatialDataset package}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

# Introduction

The `SpatialDatasets` package contains a collection of spatially-resolved omics datasets, which have been formatted into the [SpatialExperiment](https://bioconductor.org/packages/SpatialExperiment), [MoleculeExperiment](https://bioconductor.org/packages/MoleculeExperiment) or [CytoImageList](https://bioconductor.org/packages/cytomapper) Bioconductor classes, for use in examples, demonstrations, and tutorials. The datasets are from several different platforms including IMC, MIBI-TOF, Xenium, CosMx and MERFISH. They have been sourced from various publicly available sources.

# Installation

To install the `SpatialDatasets` package from GitHub:

```{r, eval=FALSE}
if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("SpatialDatasets")
```

# Datasets

The package contains the following datasets:

- `spe_Keren_2018`: A study on triple negative breast cancer containing 40 samples measured using MIBI-TOF published by [Keren et al. (2018)](https://doi.org/10.1016/j.cell.2018.08.039).

- `Ferguson_Images`: A study on head and neck cutaneous squamous cell carcinoma containing 44 samples measured using IMC published by [Ferguson et al. (2022)](https://doi.org/10.1158/1078-0432.CCR-22-1332).

- `spe_Ferguson_2022`: A study on head and neck cutaneous squamous cell carcinoma containing 44 samples measured using IMC published by [Ferguson et al. (2022)](https://doi.org/10.1158/1078-0432.CCR-22-1332).

- `spe_Schurch_2020`: A study on advanced colorectal cancer containing 140 samples measured using CODEX published by [Schurch et al. (2020)](https://doi.org/10.1016/j.cell.2020.07.005). 

- `spe_Ali_2020`: A study on breast cancer containing 483 samples measured using IMC published by [Ali et al. (2020)](https://doi.org/10.1038/s43018-020-0026-6). 

- `spe_Amancherla_2025`: A study on heart transplant rejection containing 35 samples measured with Xenium published by [Amancherla et al. (2025)](https://doi.org/10.1101/2025.02.28.640852)

- `spe_Vannan_2025`: A study on pulmonary fibrosis containing 35 lung samples 
measured with Xenium published by [Vannan et al. (2025)](https://doi.org/10.1038/s41588-025-02080-x).

# Load data

The following examples show how to load the example datasets as `SpatialExperiment` objects in an R session.

There are two options for loading the datasets: either using named accessor functions or by querying the ExperimentHub database.

## Load using named accessors

```{r, message=FALSE}
library(SpatialExperiment)
library(SpatialDatasets)
```

### Keren et al. (2018)

A study on triple negative breast cancer containing 40 samples measured using MIBI-TOF published by [Keren et al. (2018)](https://doi.org/10.1016/j.cell.2018.08.039).

```{r, message=FALSE}
# load object
spe <- spe_Keren_2018()

# check object
spe
```

### Ferguson et al. (2022)

A study on head and neck cutaneous squamous cell carcinoma containing 44 samples measured using IMC published by [Ferguson et al. (2022)](https://doi.org/10.1158/1078-0432.CCR-22-1332).

#### Ferguson Images

In the chunk below, we've provided code for generating a `CytoImageList` object from the images `zip` file provided by the `Ferguson_Images()` function. 

```{r, message=FALSE}
# load object
zip <- Ferguson_Images()
tmp <- tempfile()
unzip(zip, exdir = tmp)

images <- cytomapper::loadImages(
  tmp,
  single_channel = TRUE,
  on_disk = TRUE,
  h5FilesPath = HDF5Array::getHDF5DumpDir()
)

# check object
images
```

#### Ferguson SpatialExperiment Object

```{r, message=FALSE}
# load object
spe <- spe_Ferguson_2022()

# check object
spe
```

### Schurch et al. (2020)

A study on advanced colorectal cancer containing 140 samples measured using CODEX published by [Schurch et al. (2020)](https://doi.org/10.1016/j.cell.2020.07.005). 

```{r, message=FALSE}
# load object
spe <- spe_Schurch_2020()

# check object
spe
```

### Ali et al. (2020)

A study on breast cancer containing 483 samples measured using IMC published by [Ali et al. (2020)](https://doi.org/10.1038/s43018-020-0026-6). 

```{r, message=FALSE}
# load object
spe <- spe_Ali_2020()

# check object
spe
```


### Amancherla et al. (2025)

A study on heart transplant rejection containing 62 (49 adult and 13 pediatric) samples measured with Xenium published by [Amancherla et al. (2025)](https://doi.org/10.1101/2025.02.28.640852). 

```{r, message = FALSE}
# load object
spe <- spe_Amancherla_2025()

# check object
spe
```

### Vannan et al. (2025)

A study on pulmonary fibrosis containing 35 lung samples measured with Xenium published by [Vannan et al. (2025)](https://doi.org/10.1038/s41588-025-02080-x).

```{r, message = FALSE}
# load object
spe <- spe_Vannan_2025()

# check object
spe
```

# Generating objects from raw data files

For reference, we include code scripts to generate the `SpatialExperiment`, `MoleculeExperiment` or `CytoImageList` objects from the raw data files.

These scripts are saved in `/inst/scripts/` in the source code of the `SpatialDatasets` package. The scripts include references and links to the data files from the original sources for each dataset.

# Session information

```{r}
sessionInfo()
```
