Contents

1 Overview

RBedMethyl provides disk-backed access to nanoporetech modkit bedMethyl files generated from ONT data, enabling ONT-scale workflows without loading full tables into memory. This vignette demonstrates a typical workflow: ingest, subset by region, compute per-region summaries, and coerce to downstream Bioconductor classes.

2 The bedMethyl format

modkit, the bioinformatics tool produced by Oxford Nanopore Technologies, is one of the easiest and most direct ways to extract site-specific methylation information from a long-read BAM file produced by Dorado. As explained in UCSC, the bedMethyl format is an extension of the standard BED 9 format, with the following columns:

column name description type
1 chrom name of reference sequence from BAM header str
2 start position 0-based start position int
3 end position 0-based exclusive end position int
4 modified base code single letter code for modified base str
5 score Equal to Nvalid_cov. int
6 strand ‘+’ for positive strand ‘-’ for negative strand, ‘.’ when strands are combined str
7 start position included for compatibility int
8 end position included for compatibility int
9 color included for compatibility, always 255,0,0 str
10 Nvalid_cov Number of reads with valid modification call int
11 fraction modified Percent of valid calls that are modified float
12 Nmod Number of calls with a modified base int
13 Ncanonical Number of calls with a canonical base int
14 Nother_mod Number of calls with a modified base, other modifications int
15 Ndelete Number of reads with a deletion at this reference position int
16 Nfail Number of calls where the probability of the call was below the threshold int
17 Ndiff Number of reads with a base other than the canonical base for this modification int
18 Nnocall Number of reads aligned to this reference position, with the correct canonical base, but without a base modification call int

3 RBedMethyl as a plug-and-play connector

RBedMethyl package provides a minimalistic interface, which allows the users to process bedMethyl files. They can:

  1. load them into R, with a minimal memory footprint with readBedMethyl
  2. subset them:
    • by chromosome(s) (subsetByChromosomes)
    • by region(s) (subsetByRegion or using the [] bracketing index system)
    • by coverage (filterByCoverage)
    • by a specific filter on an assay (subsetBy)
  3. create region summary statistics (summarizeByRegion)
  4. convert them to other Bioconductor-powered data structures using the as(x, name) coercion: RangedSummarizedExperiment and BSseq

4 Ingest a bedMethyl file

The RBedMethyl object is an HDF5-backed structure. It is designed to keep a single modification (i.e, “h” or “m”). The user can select which fields to keep from the bedMethyl file during the reading process. The correspondence is:

bedMethyl name RBedMethyl assay
Nvalid_cov coverage
fraction modified pct
Nmod mod_reads
Ncanonical unmod_reads
Nother_mod other_reads
Ndelete del_reads
Nfail fail_reads
Ndiff diff_reads
Nnocall nocall_reads

By default the readBedMethyl only loads “coverage”, “pct” and “mod_reads” assays.

# Example bedMethyl-like content, header is dropped
df <- data.frame(
  r1 = c("chr1", 0, 1, "m", 0, "+", 0, 1, 0, 10, 0.5, 5, 5, 0, 0, 0, 0, 0),
  r2 = c("chr1", 10, 11, "m", 0, "+", 10, 11, 0, 20, 0.25, 5, 15, 0, 0, 0, 0, 0),
  r3 = c("chr2", 0, 1, "m", 0, "-", 0, 1, 0, 8, 0.375, 3, 5, 0, 0, 0, 0, 0),
  r4 = c("chr2", 0, 1, "h", 0, "-", 0, 1, 0, 8, 0.375, 3, 5, 0, 0, 0, 0, 0)
)
df <- t(df)
colnames(df) <- c(
  "chrom", "chromStart", "chromEnd", "mod", "score", "strand",
  "thickStart", "thickEnd", "itemRgb", "coverage", "pct",
  "mod_reads", "unmod_reads", "other_reads",
  "del_reads", "fail_reads", "diff_reads", "nocall_reads"
)
df
#>    chrom  chromStart chromEnd mod score strand thickStart thickEnd itemRgb
#> r1 "chr1" "0"        "1"      "m" "0"   "+"    "0"        "1"      "0"    
#> r2 "chr1" "10"       "11"     "m" "0"   "+"    "10"       "11"     "0"    
#> r3 "chr2" "0"        "1"      "m" "0"   "-"    "0"        "1"      "0"    
#> r4 "chr2" "0"        "1"      "h" "0"   "-"    "0"        "1"      "0"    
#>    coverage pct     mod_reads unmod_reads other_reads del_reads fail_reads
#> r1 "10"     "0.5"   "5"       "5"         "0"         "0"       "0"       
#> r2 "20"     "0.25"  "5"       "15"        "0"         "0"       "0"       
#> r3 "8"      "0.375" "3"       "5"         "0"         "0"       "0"       
#> r4 "8"      "0.375" "3"       "5"         "0"         "0"       "0"       
#>    diff_reads nocall_reads
#> r1 "0"        "0"         
#> r2 "0"        "0"         
#> r3 "0"        "0"         
#> r4 "0"        "0"

bed <- tempfile(fileext = ".bed")
write.table(df, bed, quote = FALSE, sep = "\t", row.names = FALSE, col.names = FALSE)

# Read only "m" modification rows, and only coverage,pct and mod_reads assays
bm <- readBedMethyl(bed, mod = "m", fields = c("coverage", "pct", "mod_reads"))

bm
#> RBedMethyl object
#>   Modification: m 
#>   Rows (active/total): 3 / 3 
#>   Chromosomes: 2 ( chr1, chr2 )
#>   Assays: chrom, chromStart, chromEnd, strand, coverage, pct, mod_reads

5 Subsetting example

# By one or more chromosomes
subsetByChromosomes(bm, "chr1")
#> RBedMethyl object
#>   Modification: m 
#>   Rows (active/total): 2 / 3 
#>   Chromosomes: 2 ( chr1, chr2 )
#>   Assays: chrom, chromStart, chromEnd, strand, coverage, pct, mod_reads
subsetByChromosomes(bm, c("chr1", "chr2"))
#> RBedMethyl object
#>   Modification: m 
#>   Rows (active/total): 3 / 3 
#>   Chromosomes: 2 ( chr1, chr2 )
#>   Assays: chrom, chromStart, chromEnd, strand, coverage, pct, mod_reads

# By region
## Can supply the chromosome, start, end
subsetByRegion(bm, "chr1", 1, 12)
#> RBedMethyl object
#>   Modification: m 
#>   Rows (active/total): 1 / 3 
#>   Chromosomes: 2 ( chr1, chr2 )
#>   Assays: chrom, chromStart, chromEnd, strand, coverage, pct, mod_reads
## Or a GRanges object
regions <- GenomicRanges::GRanges(
  seqnames = c("chr1", "chr2"),
  ranges = IRanges::IRanges(start = c(1, 1), end = c(12, 2))
)
subsetByRegion(bm, regions)
#> RBedMethyl object
#>   Modification: m 
#>   Rows (active/total): 3 / 3 
#>   Chromosomes: 2 ( chr1, chr2 )
#>   Assays: chrom, chromStart, chromEnd, strand, coverage, pct, mod_reads

## The [] bracketing index system also works
bm[regions]
#> RBedMethyl object
#>   Modification: m 
#>   Rows (active/total): 3 / 3 
#>   Chromosomes: 2 ( chr1, chr2 )
#>   Assays: chrom, chromStart, chromEnd, strand, coverage, pct, mod_reads

# By minimum coverage
filterByCoverage(bm, 15)
#> RBedMethyl object
#>   Modification: m 
#>   Rows (active/total): 1 / 3 
#>   Chromosomes: 2 ( chr1, chr2 )
#>   Assays: chrom, chromStart, chromEnd, strand, coverage, pct, mod_reads

# By column using a function
## The following keeps only lines where the methylation percentage is higher than 0.3
subsetBy(bm, "pct", function(x) x > 0.3)
#> RBedMethyl object
#>   Modification: m 
#>   Rows (active/total): 2 / 3 
#>   Chromosomes: 2 ( chr1, chr2 )
#>   Assays: chrom, chromStart, chromEnd, strand, coverage, pct, mod_reads

6 Region-level summaries

regions <- GenomicRanges::GRanges(
  seqnames = c("chr1", "chr2"),
  ranges = IRanges::IRanges(start = c(1, 1), end = c(12, 2))
)
# For each region, generate summary statistics
## `mod_reads` is the total number of reads, `beta` stands for average methylation, `n_sites` is the number of sites
summarizeByRegion(bm, regions)
#> DataFrame with 2 rows and 4 columns
#>    coverage mod_reads      beta   n_sites
#>   <numeric> <numeric> <numeric> <integer>
#> 1        30        10  0.333333         2
#> 2         8         3  0.375000         1

7 Coercion to downstream classes

# RangedSummarizedExperiment
rse <- as(bm, "RangedSummarizedExperiment")
rse
#> class: RangedSummarizedExperiment 
#> dim: 3 1 
#> metadata(0):
#> assays(3): coverage mod_reads pct
#> rownames: NULL
#> rowData names(0):
#> colnames: NULL
#> colData names(0):

# bsseq (if installed)
if (requireNamespace("bsseq", quietly = TRUE) &&
    methods::hasMethod("coerce", c("RBedMethyl", "BSseq"))) {
  bs <- as(bm, "BSseq")
  bs
} else {
  message("bsseq is not installed or BSseq coercion is unavailable; skipping BSseq coercion.")
}
#> An object of type 'BSseq' with
#>   3 methylation loci
#>   1 samples
#> has not been smoothed
#> All assays are in-memory

7.1 Session info

sessionInfo()
#> R Under development (unstable) (2026-03-05 r89546)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] RBedMethyl_0.99.1 BiocStyle_2.39.0 
#> 
#> loaded via a namespace (and not attached):
#>  [1] SummarizedExperiment_1.41.1 rjson_0.2.23               
#>  [3] xfun_0.57                   bslib_0.10.0               
#>  [5] rhdf5_2.55.16               Biobase_2.71.0             
#>  [7] lattice_0.22-9              bitops_1.0-9               
#>  [9] rhdf5filters_1.23.3         tools_4.6.0                
#> [11] generics_0.1.4              curl_7.0.0                 
#> [13] stats4_4.6.0                parallel_4.6.0             
#> [15] R.oo_1.27.1                 Matrix_1.7-5               
#> [17] data.table_1.18.2.1         BSgenome_1.79.1            
#> [19] RColorBrewer_1.1-3          cigarillo_1.1.0            
#> [21] S4Vectors_0.49.0            sparseMatrixStats_1.23.0   
#> [23] lifecycle_1.0.5             compiler_4.6.0             
#> [25] farver_2.1.2                Rsamtools_2.27.1           
#> [27] Biostrings_2.79.5           statmod_1.5.1              
#> [29] Seqinfo_1.1.0               codetools_0.2-20           
#> [31] permute_0.9-10              htmltools_0.5.9            
#> [33] sass_0.4.10                 RCurl_1.98-1.18            
#> [35] yaml_2.3.12                 crayon_1.5.3               
#> [37] jquerylib_0.1.4             R.utils_2.13.0             
#> [39] BiocParallel_1.45.0         DelayedArray_0.37.0        
#> [41] cachem_1.1.0                limma_3.67.0               
#> [43] abind_1.4-8                 gtools_3.9.5               
#> [45] locfit_1.5-9.12             digest_0.6.39              
#> [47] restfulr_0.0.16             bookdown_0.46              
#> [49] fastmap_1.2.0               grid_4.6.0                 
#> [51] cli_3.6.5                   SparseArray_1.11.11        
#> [53] S4Arrays_1.11.1             XML_3.99-0.23              
#> [55] dichromat_2.0-0.1           h5mread_1.3.2              
#> [57] DelayedMatrixStats_1.33.0   scales_1.4.0               
#> [59] httr_1.4.8                  rmarkdown_2.30             
#> [61] XVector_0.51.0              matrixStats_1.5.0          
#> [63] otel_0.2.0                  R.methodsS3_1.8.2          
#> [65] beachmat_2.27.3             HDF5Array_1.39.0           
#> [67] evaluate_1.0.5              knitr_1.51                 
#> [69] GenomicRanges_1.63.1        IRanges_2.45.0             
#> [71] BiocIO_1.21.0               rtracklayer_1.71.3         
#> [73] rlang_1.1.7                 Rcpp_1.1.1                 
#> [75] glue_1.8.0                  BiocManager_1.30.27        
#> [77] BiocGenerics_0.57.0         bsseq_1.47.1               
#> [79] jsonlite_2.0.0              R6_2.6.1                   
#> [81] Rhdf5lib_1.33.6             GenomicAlignments_1.47.0   
#> [83] MatrixGenerics_1.23.0