---
title: "cBioPortalData: Data Build Errors"
author: "Marcel Ramos & Levi Waldron"
date: "`r format(Sys.time(), '%B %d, %Y')`"
vignette: >
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteIndexEntry{cBioPortal Data Build Errors}
  %\VignetteEncoding{UTF-8}
output:
  BiocStyle::html_document:
    number_sections: no
    toc_depth: 4
---

```{r, setup, include=FALSE}
knitr::opts_chunk$set(cache = TRUE)
```

# Loading

```{r load_libs, include=TRUE,results="hide",message=FALSE,warning=FALSE}
library(cBioPortalData)
library(AnVIL)
library(jsonlite)
```

# Overview

This document serves as a reporting tool for errors that occur
when running our utility functions on the cBioPortal datasets.

## Data from the cBioPortal API (`cBioPortalData()`)

Typically, the number of errors encountered via the API are low.
There are only a handful of packages that error when we apply the
utility functions to provide a MultiAssayExperiment data representation.

First, we load the error `Rda` dataset.

```{r load_api_errs}
api_errs <- system.file(
    "extdata", "api", "err_api_info.json",
    package = "cBioPortalData", mustWork = TRUE
)
err_api_info <- fromJSON(api_errs)
```

We can now inspect the contents of the data:

```{r inspect_api_errs}
class(err_api_info)
length(err_api_info)
lengths(err_api_info)
```

There were about `r length(err_api_info)` unique errors during the last
build run.

```{r api_err_names}
names(err_api_info)
```

The most common error was `Inconsistent build numbers found`. This is
due to annotations from different build numbers that were not able to
be resolved.

To see what datasets (`cancer_study_id` s) have that error we can use:

```{r inconsistent_build}
err_api_info[['Inconsistent build numbers found']]
```

We can also have a look at the entirety of the dataset.

```{r all_api_errs}
err_api_info
```

## Packaged data from `cBioDataPack()`

Now let's look at the errors in the packaged datasets that are used for
`cBioDataPack`:

```{r load_pack_errs}
pack_errs <- system.file(
    "extdata", "pack", "err_pack_info.json",
    package = "cBioPortalData", mustWork = TRUE
)
err_pack_info <- fromJSON(pack_errs)
```

We can do the same for this data:

```{r inspect_pack_errs}
length(err_pack_info)
lengths(err_pack_info)
```

We can get a list of all the errors present:

```{r pack_err_names}
names(err_pack_info)
```

And finally the full list of errors:

```{r all_pack_errs}
err_pack_info
```

# sessionInfo

```{r sessioninfo}
sessionInfo()
```
