1 Overview
2 The bedMethyl format
3 RBedMethyl as a plug-and-play connector
4 Ingest a bedMethyl file
5 Subsetting example
6 Region-level summaries
7 Coercion to downstream classes
- 7.1 Session info

1 Overview

RBedMethyl provides disk-backed access to nanoporetech modkit bedMethyl files generated from ONT data, enabling ONT-scale workflows without loading full tables into memory. This vignette demonstrates a typical workflow: ingest, subset by region, compute per-region summaries, and coerce to downstream Bioconductor classes.

2 The bedMethyl format

modkit, the bioinformatics tool produced by Oxford Nanopore Technologies, is one of the easiest and most direct ways to extract site-specific methylation information from a long-read BAM file produced by Dorado. As explained in UCSC, the bedMethyl format is an extension of the standard BED 9 format, with the following columns:

column	name	description	type
1	chrom	name of reference sequence from BAM header	str
2	start position	0-based start position	int
3	end position	0-based exclusive end position	int
4	modified base code	single letter code for modified base	str
5	score	Equal to N_{valid_cov}.	int
6	strand	‘+’ for positive strand ‘-’ for negative strand, ‘.’ when strands are combined	str
7	start position	included for compatibility	int
8	end position	included for compatibility	int
9	color	included for compatibility, always 255,0,0	str
10	N_{valid_cov}	Number of reads with valid modification call	int
11	fraction modified	Percent of valid calls that are modified	float
12	N_mod	Number of calls with a modified base	int
13	N_canonical	Number of calls with a canonical base	int
14	N_{other_mod}	Number of calls with a modified base, other modifications	int
15	N_delete	Number of reads with a deletion at this reference position	int
16	N_fail	Number of calls where the probability of the call was below the threshold	int
17	N_diff	Number of reads with a base other than the canonical base for this modification	int
18	N_nocall	Number of reads aligned to this reference position, with the correct canonical base, but without a base modification call	int

3 RBedMethyl as a plug-and-play connector

RBedMethyl package provides a minimalistic interface, which allows the users to process bedMethyl files. They can:

load them into R, with a minimal memory footprint with readBedMethyl
subset them:
- by chromosome(s) (subsetByChromosomes)
- by region(s) (subsetByRegion or using the [] bracketing index system)
- by coverage (filterByCoverage)
- by a specific filter on an assay (subsetBy)
create region summary statistics (summarizeByRegion)
convert them to other Bioconductor-powered data structures using the as(x, name) coercion: RangedSummarizedExperiment and BSseq

4 Ingest a bedMethyl file

The RBedMethyl object is an HDF5-backed structure. It is designed to keep a single modification (i.e, “h” or “m”). The user can select which fields to keep from the bedMethyl file during the reading process. The correspondence is:

bedMethyl name	RBedMethyl assay
N_{valid_cov}	coverage
fraction modified	pct
N_mod	mod_reads
N_canonical	unmod_reads
N_{other_mod}	other_reads
N_delete	del_reads
N_fail	fail_reads
N_diff	diff_reads
N_nocall	nocall_reads

By default the readBedMethyl only loads “coverage”, “pct” and “mod_reads” assays.

# Example bedMethyl-like content, header is dropped
df <- data.frame(
  r1 = c("chr1", 0, 1, "m", 0, "+", 0, 1, 0, 10, 0.5, 5, 5, 0, 0, 0, 0, 0),
  r2 = c("chr1", 10, 11, "m", 0, "+", 10, 11, 0, 20, 0.25, 5, 15, 0, 0, 0, 0, 0),
  r3 = c("chr2", 0, 1, "m", 0, "-", 0, 1, 0, 8, 0.375, 3, 5, 0, 0, 0, 0, 0),
  r4 = c("chr2", 0, 1, "h", 0, "-", 0, 1, 0, 8, 0.375, 3, 5, 0, 0, 0, 0, 0)
)
df <- t(df)
colnames(df) <- c(
  "chrom", "chromStart", "chromEnd", "mod", "score", "strand",
  "thickStart", "thickEnd", "itemRgb", "coverage", "pct",
  "mod_reads", "unmod_reads", "other_reads",
  "del_reads", "fail_reads", "diff_reads", "nocall_reads"
)
df
#>    chrom  chromStart chromEnd mod score strand thickStart thickEnd itemRgb
#> r1 "chr1" "0"        "1"      "m" "0"   "+"    "0"        "1"      "0"    
#> r2 "chr1" "10"       "11"     "m" "0"   "+"    "10"       "11"     "0"    
#> r3 "chr2" "0"        "1"      "m" "0"   "-"    "0"        "1"      "0"    
#> r4 "chr2" "0"        "1"      "h" "0"   "-"    "0"        "1"      "0"    
#>    coverage pct     mod_reads unmod_reads other_reads del_reads fail_reads
#> r1 "10"     "0.5"   "5"       "5"         "0"         "0"       "0"       
#> r2 "20"     "0.25"  "5"       "15"        "0"         "0"       "0"       
#> r3 "8"      "0.375" "3"       "5"         "0"         "0"       "0"       
#> r4 "8"      "0.375" "3"       "5"         "0"         "0"       "0"       
#>    diff_reads nocall_reads
#> r1 "0"        "0"         
#> r2 "0"        "0"         
#> r3 "0"        "0"         
#> r4 "0"        "0"

bed <- tempfile(fileext = ".bed")
write.table(df, bed, quote = FALSE, sep = "\t", row.names = FALSE, col.names = FALSE)

# Read only "m" modification rows, and only coverage,pct and mod_reads assays
bm <- readBedMethyl(bed, mod = "m", fields = c("coverage", "pct", "mod_reads"))

bm
#> RBedMethyl object
#>   Modification: m 
#>   Rows (active/total): 3 / 3 
#>   Chromosomes: 2 ( chr1, chr2 )
#>   Assays: chrom, chromStart, chromEnd, strand, coverage, pct, mod_reads

5 Subsetting example

# By one or more chromosomes
subsetByChromosomes(bm, "chr1")
#> RBedMethyl object
#>   Modification: m 
#>   Rows (active/total): 2 / 3 
#>   Chromosomes: 2 ( chr1, chr2 )
#>   Assays: chrom, chromStart, chromEnd, strand, coverage, pct, mod_reads
subsetByChromosomes(bm, c("chr1", "chr2"))
#> RBedMethyl object
#>   Modification: m 
#>   Rows (active/total): 3 / 3 
#>   Chromosomes: 2 ( chr1, chr2 )
#>   Assays: chrom, chromStart, chromEnd, strand, coverage, pct, mod_reads

# By region
## Can supply the chromosome, start, end
subsetByRegion(bm, "chr1", 1, 12)
#> RBedMethyl object
#>   Modification: m 
#>   Rows (active/total): 1 / 3 
#>   Chromosomes: 2 ( chr1, chr2 )
#>   Assays: chrom, chromStart, chromEnd, strand, coverage, pct, mod_reads
## Or a GRanges object
regions <- GenomicRanges::GRanges(
  seqnames = c("chr1", "chr2"),
  ranges = IRanges::IRanges(start = c(1, 1), end = c(12, 2))
)
subsetByRegion(bm, regions)
#> RBedMethyl object
#>   Modification: m 
#>   Rows (active/total): 3 / 3 
#>   Chromosomes: 2 ( chr1, chr2 )
#>   Assays: chrom, chromStart, chromEnd, strand, coverage, pct, mod_reads

## The [] bracketing index system also works
bm[regions]
#> RBedMethyl object
#>   Modification: m 
#>   Rows (active/total): 3 / 3 
#>   Chromosomes: 2 ( chr1, chr2 )
#>   Assays: chrom, chromStart, chromEnd, strand, coverage, pct, mod_reads

# By minimum coverage
filterByCoverage(bm, 15)
#> RBedMethyl object
#>   Modification: m 
#>   Rows (active/total): 1 / 3 
#>   Chromosomes: 2 ( chr1, chr2 )
#>   Assays: chrom, chromStart, chromEnd, strand, coverage, pct, mod_reads

# By column using a function
## The following keeps only lines where the methylation percentage is higher than 0.3
subsetBy(bm, "pct", function(x) x > 0.3)
#> RBedMethyl object
#>   Modification: m 
#>   Rows (active/total): 2 / 3 
#>   Chromosomes: 2 ( chr1, chr2 )
#>   Assays: chrom, chromStart, chromEnd, strand, coverage, pct, mod_reads

6 Region-level summaries

regions <- GenomicRanges::GRanges(
  seqnames = c("chr1", "chr2"),
  ranges = IRanges::IRanges(start = c(1, 1), end = c(12, 2))
)
# For each region, generate summary statistics
## `mod_reads` is the total number of reads, `beta` stands for average methylation, `n_sites` is the number of sites
summarizeByRegion(bm, regions)
#> DataFrame with 2 rows and 4 columns
#>    coverage mod_reads      beta   n_sites
#>   <numeric> <numeric> <numeric> <integer>
#> 1        30        10  0.333333         2
#> 2         8         3  0.375000         1

7 Coercion to downstream classes

# RangedSummarizedExperiment
rse <- as(bm, "RangedSummarizedExperiment")
rse
#> class: RangedSummarizedExperiment 
#> dim: 3 1 
#> metadata(0):
#> assays(3): coverage mod_reads pct
#> rownames: NULL
#> rowData names(0):
#> colnames: NULL
#> colData names(0):

# bsseq (if installed)
if (requireNamespace("bsseq", quietly = TRUE) &&
    methods::hasMethod("coerce", c("RBedMethyl", "BSseq"))) {
  bs <- as(bm, "BSseq")
  bs
} else {
  message("bsseq is not installed or BSseq coercion is unavailable; skipping BSseq coercion.")
}
#> An object of type 'BSseq' with
#>   3 methylation loci
#>   1 samples
#> has not been smoothed
#> Some assays are HDF5Array-backed

7.1 Session info

sessionInfo()
#> R version 4.6.1 (2026-06-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.24-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] RBedMethyl_1.1.1 BiocStyle_2.41.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] SummarizedExperiment_1.43.0 rjson_0.2.23               
#>  [3] xfun_0.60                   bslib_0.11.0               
#>  [5] rhdf5_2.57.1                Biobase_2.73.1             
#>  [7] lattice_0.22-9              bitops_1.0-9               
#>  [9] rhdf5filters_1.25.0         tools_4.6.1                
#> [11] generics_0.1.4              curl_7.1.0                 
#> [13] stats4_4.6.1                parallel_4.6.1             
#> [15] R.oo_1.27.1                 BiocBaseUtils_1.15.1       
#> [17] Matrix_1.7-5                data.table_1.18.4          
#> [19] BSgenome_1.81.0             RColorBrewer_1.1-3         
#> [21] cigarillo_1.3.0             S4Vectors_0.51.5           
#> [23] sparseMatrixStats_1.25.0    lifecycle_1.0.5            
#> [25] compiler_4.6.1              farver_2.1.2               
#> [27] Rsamtools_2.29.0            Biostrings_2.81.5          
#> [29] statmod_1.5.2               Seqinfo_1.3.0              
#> [31] codetools_0.2-20            permute_0.9-10             
#> [33] htmltools_0.5.9             sass_0.4.10                
#> [35] RCurl_1.98-1.19             yaml_2.3.12                
#> [37] beachmat.hdf5_1.11.0        crayon_1.5.3               
#> [39] jquerylib_0.1.4             R.utils_2.13.0             
#> [41] BiocParallel_1.47.0         DelayedArray_0.39.3        
#> [43] cachem_1.1.0                limma_3.69.2               
#> [45] abind_1.4-8                 gtools_3.9.5               
#> [47] locfit_1.5-9.12             digest_0.6.39              
#> [49] restfulr_0.0.17             bookdown_0.47              
#> [51] fastmap_1.2.0               grid_4.6.1                 
#> [53] cli_3.6.6                   SparseArray_1.13.2         
#> [55] S4Arrays_1.13.0             XML_3.99-0.23              
#> [57] dichromat_2.0-0.1           h5mread_1.5.0              
#> [59] DelayedMatrixStats_1.35.0   scales_1.4.0               
#> [61] httr_1.4.8                  rmarkdown_2.31             
#> [63] XVector_0.53.0              matrixStats_1.5.0          
#> [65] otel_0.2.0                  R.methodsS3_1.8.2          
#> [67] beachmat_2.29.0             HDF5Array_1.41.0           
#> [69] evaluate_1.0.5              knitr_1.51                 
#> [71] GenomicRanges_1.65.1        IRanges_2.47.2             
#> [73] BiocIO_1.23.3               rtracklayer_1.73.0         
#> [75] rlang_1.3.0                 Rcpp_1.1.2                 
#> [77] glue_1.8.1                  BiocManager_1.30.27        
#> [79] BiocGenerics_0.59.10        bsseq_1.49.0               
#> [81] jsonlite_2.0.0              R6_2.6.1                   
#> [83] Rhdf5lib_2.1.0              GenomicAlignments_1.49.1   
#> [85] MatrixGenerics_1.25.0

ONT-scale workflows with RBedMethyl

January 25, 2026

Contents