TileDBArray 1.15.4
TileDB implements a framework for local and remote storage of dense and sparse arrays.
We can use this as a DelayedArray
backend to provide an array-level abstraction,
thus allowing the data to be used in many places where an ordinary array or matrix might be used.
The TileDBArray package implements the necessary wrappers around TileDB-R
to support read/write operations on TileDB arrays within the DelayedArray framework.
TileDBArray
Creating a TileDBArray
is as easy as:
X <- matrix(rnorm(1000), ncol=10)
library(TileDBArray)
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.55225529 1.05841679 0.59685279 . 0.9952546 0.4874329
## [2,] 0.75818475 -0.32059649 -0.62951946 . -0.9856361 -0.6343097
## [3,] -0.19772213 -1.69250598 0.25821687 . 0.6620568 -1.3350016
## [4,] -0.00855006 2.14751878 0.76214142 . 0.2165524 0.5370952
## [5,] -0.75860963 -0.42281519 -0.43877821 . 0.1588387 -1.0420297
## ... . . . . . .
## [96,] -0.17576650 -0.11818332 -1.19131968 . -0.12141830 -0.51579676
## [97,] -1.32520932 -0.32362425 -0.09594838 . -0.82035920 1.48051420
## [98,] 0.51706970 0.59060632 -0.40619360 . 0.46422347 0.38635703
## [99,] -0.39102479 0.24542996 -0.37885584 . 0.99811702 0.06224507
## [100,] 1.17681342 0.36311851 -1.08269506 . -0.25327928 -1.00443096
Alternatively, we can use coercion methods:
as(X, "TileDBArray")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.55225529 1.05841679 0.59685279 . 0.9952546 0.4874329
## [2,] 0.75818475 -0.32059649 -0.62951946 . -0.9856361 -0.6343097
## [3,] -0.19772213 -1.69250598 0.25821687 . 0.6620568 -1.3350016
## [4,] -0.00855006 2.14751878 0.76214142 . 0.2165524 0.5370952
## [5,] -0.75860963 -0.42281519 -0.43877821 . 0.1588387 -1.0420297
## ... . . . . . .
## [96,] -0.17576650 -0.11818332 -1.19131968 . -0.12141830 -0.51579676
## [97,] -1.32520932 -0.32362425 -0.09594838 . -0.82035920 1.48051420
## [98,] 0.51706970 0.59060632 -0.40619360 . 0.46422347 0.38635703
## [99,] -0.39102479 0.24542996 -0.37885584 . 0.99811702 0.06224507
## [100,] 1.17681342 0.36311851 -1.08269506 . -0.25327928 -1.00443096
This process works also for sparse matrices:
Y <- Matrix::rsparsematrix(1000, 1000, density=0.01)
writeTileDBArray(Y)
## <1000 x 1000> sparse TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] 0.0 0.3 0.0 . -0.44 0.00
## [2,] 0.0 0.0 0.0 . 0.00 0.00
## [3,] 0.0 0.0 0.0 . 0.00 0.00
## [4,] 0.0 0.0 0.0 . 0.00 0.00
## [5,] 0.0 0.0 0.0 . 0.00 0.00
## ... . . . . . .
## [996,] 0 0 0 . 0.00 0.00
## [997,] 0 0 0 . 0.00 0.00
## [998,] 0 0 0 . 0.00 0.00
## [999,] 0 0 0 . 0.00 0.00
## [1000,] 0 0 0 . 0.88 0.00
Logical and integer matrices are supported:
writeTileDBArray(Y > 0)
## <1000 x 1000> sparse TileDBMatrix object of type "logical":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] FALSE TRUE FALSE . FALSE FALSE
## [2,] FALSE FALSE FALSE . FALSE FALSE
## [3,] FALSE FALSE FALSE . FALSE FALSE
## [4,] FALSE FALSE FALSE . FALSE FALSE
## [5,] FALSE FALSE FALSE . FALSE FALSE
## ... . . . . . .
## [996,] FALSE FALSE FALSE . FALSE FALSE
## [997,] FALSE FALSE FALSE . FALSE FALSE
## [998,] FALSE FALSE FALSE . FALSE FALSE
## [999,] FALSE FALSE FALSE . FALSE FALSE
## [1000,] FALSE FALSE FALSE . TRUE FALSE
As are matrices with dimension names:
rownames(X) <- sprintf("GENE_%i", seq_len(nrow(X)))
colnames(X) <- sprintf("SAMP_%i", seq_len(ncol(X)))
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 0.55225529 1.05841679 0.59685279 . 0.9952546 0.4874329
## GENE_2 0.75818475 -0.32059649 -0.62951946 . -0.9856361 -0.6343097
## GENE_3 -0.19772213 -1.69250598 0.25821687 . 0.6620568 -1.3350016
## GENE_4 -0.00855006 2.14751878 0.76214142 . 0.2165524 0.5370952
## GENE_5 -0.75860963 -0.42281519 -0.43877821 . 0.1588387 -1.0420297
## ... . . . . . .
## GENE_96 -0.17576650 -0.11818332 -1.19131968 . -0.12141830 -0.51579676
## GENE_97 -1.32520932 -0.32362425 -0.09594838 . -0.82035920 1.48051420
## GENE_98 0.51706970 0.59060632 -0.40619360 . 0.46422347 0.38635703
## GENE_99 -0.39102479 0.24542996 -0.37885584 . 0.99811702 0.06224507
## GENE_100 1.17681342 0.36311851 -1.08269506 . -0.25327928 -1.00443096
TileDBArray
sTileDBArray
s are simply DelayedArray
objects and can be manipulated as such.
The usual conventions for extracting data from matrix-like objects work as expected:
out <- as(X, "TileDBArray")
dim(out)
## [1] 100 10
head(rownames(out))
## [1] "GENE_1" "GENE_2" "GENE_3" "GENE_4" "GENE_5" "GENE_6"
head(out[,1])
## GENE_1 GENE_2 GENE_3 GENE_4 GENE_5 GENE_6
## 0.55225529 0.75818475 -0.19772213 -0.00855006 -0.75860963 0.35270212
We can also perform manipulations like subsetting and arithmetic.
Note that these operations do not affect the data in the TileDB backend;
rather, they are delayed until the values are explicitly required,
hence the creation of the DelayedMatrix
object.
out[1:5,1:5]
## <5 x 5> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5
## GENE_1 0.55225529 1.05841679 0.59685279 -0.91184764 -0.72989179
## GENE_2 0.75818475 -0.32059649 -0.62951946 -0.45876237 0.36070581
## GENE_3 -0.19772213 -1.69250598 0.25821687 -0.64271195 0.24109772
## GENE_4 -0.00855006 2.14751878 0.76214142 0.56478573 -1.43571887
## GENE_5 -0.75860963 -0.42281519 -0.43877821 0.30860000 -1.52049516
out * 2
## <100 x 10> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 1.10451058 2.11683359 1.19370559 . 1.9905091 0.9748657
## GENE_2 1.51636949 -0.64119298 -1.25903891 . -1.9712722 -1.2686193
## GENE_3 -0.39544426 -3.38501197 0.51643373 . 1.3241135 -2.6700033
## GENE_4 -0.01710012 4.29503757 1.52428285 . 0.4331049 1.0741903
## GENE_5 -1.51721926 -0.84563037 -0.87755642 . 0.3176774 -2.0840594
## ... . . . . . .
## GENE_96 -0.3515330 -0.2363666 -2.3826394 . -0.2428366 -1.0315935
## GENE_97 -2.6504186 -0.6472485 -0.1918968 . -1.6407184 2.9610284
## GENE_98 1.0341394 1.1812126 -0.8123872 . 0.9284469 0.7727141
## GENE_99 -0.7820496 0.4908599 -0.7577117 . 1.9962340 0.1244901
## GENE_100 2.3536268 0.7262370 -2.1653901 . -0.5065586 -2.0088619
We can also do more complex matrix operations that are supported by DelayedArray:
colSums(out)
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5 SAMP_6 SAMP_7
## -16.116818 -15.488633 3.367267 -9.502922 -12.751363 -15.051254 10.745261
## SAMP_8 SAMP_9 SAMP_10
## 22.137320 9.111438 -10.861111
out %*% runif(ncol(out))
## [,1]
## GENE_1 1.96802166
## GENE_2 -0.13903100
## GENE_3 -1.31334838
## GENE_4 3.63881743
## GENE_5 -0.05945493
## GENE_6 -0.80250601
## GENE_7 -2.20297613
## GENE_8 -0.03680921
## GENE_9 -0.37354603
## GENE_10 0.87688789
## GENE_11 -3.67761325
## GENE_12 -0.77803913
## GENE_13 -4.16352179
## GENE_14 0.63612802
## GENE_15 -2.09720219
## GENE_16 -2.82048670
## GENE_17 -1.00831262
## GENE_18 -0.71870578
## GENE_19 -1.24165794
## GENE_20 -0.46595190
## GENE_21 -0.80787082
## GENE_22 -3.04071783
## GENE_23 -0.25497120
## GENE_24 0.23666383
## GENE_25 1.20200342
## GENE_26 1.40521521
## GENE_27 2.44865209
## GENE_28 3.33245497
## GENE_29 -0.73206178
## GENE_30 -5.96707585
## GENE_31 -1.77303220
## GENE_32 -0.70433768
## GENE_33 -0.12246510
## GENE_34 -3.44003250
## GENE_35 5.51858609
## GENE_36 4.12337340
## GENE_37 0.95354308
## GENE_38 -0.08490647
## GENE_39 -2.06022334
## GENE_40 -0.06648107
## GENE_41 0.25134843
## GENE_42 -0.53268152
## GENE_43 -0.77511387
## GENE_44 0.20584415
## GENE_45 0.51884586
## GENE_46 -1.09659632
## GENE_47 1.11005259
## GENE_48 0.08898082
## GENE_49 -1.25099846
## GENE_50 0.82839299
## GENE_51 0.11786558
## GENE_52 1.45872887
## GENE_53 2.10534980
## GENE_54 -0.02041008
## GENE_55 2.25828786
## GENE_56 -0.61428348
## GENE_57 -0.80040753
## GENE_58 -0.37694202
## GENE_59 -0.64773841
## GENE_60 -0.77541363
## GENE_61 0.93454974
## GENE_62 2.67374304
## GENE_63 1.05734061
## GENE_64 -0.53312240
## GENE_65 -2.13646474
## GENE_66 0.68777202
## GENE_67 -1.13359644
## GENE_68 1.88146625
## GENE_69 1.89652338
## GENE_70 1.99847955
## GENE_71 1.53948555
## GENE_72 1.77681762
## GENE_73 -2.59546303
## GENE_74 -0.43085000
## GENE_75 1.82299496
## GENE_76 0.59384651
## GENE_77 2.16403073
## GENE_78 3.91229175
## GENE_79 -2.28961094
## GENE_80 -0.70917091
## GENE_81 1.34222126
## GENE_82 0.08942353
## GENE_83 -0.65824296
## GENE_84 -1.70449509
## GENE_85 -0.21869392
## GENE_86 -0.85106566
## GENE_87 0.41928845
## GENE_88 -0.78248265
## GENE_89 0.02497867
## GENE_90 1.20208485
## GENE_91 -1.78656272
## GENE_92 0.56039590
## GENE_93 -0.90487462
## GENE_94 -0.50683384
## GENE_95 0.96836004
## GENE_96 -0.96050732
## GENE_97 0.26294959
## GENE_98 0.48604834
## GENE_99 -1.23210291
## GENE_100 -2.63622383
We can adjust some parameters for creating the backend with appropriate arguments to writeTileDBArray()
.
For example, the example below allows us to control the path to the backend
as well as the name of the attribute containing the data.
X <- matrix(rnorm(1000), ncol=10)
path <- tempfile()
writeTileDBArray(X, path=path, attr="WHEE")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.95942298 0.07138591 0.72378094 . 0.8292767 -0.1295650
## [2,] 0.25769471 0.23607843 0.65032623 . 1.0759468 -1.4793737
## [3,] -1.48594494 0.57356877 0.78875974 . -0.7007660 0.1050982
## [4,] 1.16547374 0.79770396 0.44324183 . 1.8395335 -0.3787531
## [5,] -0.02676217 -1.74428665 -0.30632079 . -0.8642533 2.1463600
## ... . . . . . .
## [96,] -1.0923628 -0.1567279 -0.4464441 . -0.63094079 -0.68922334
## [97,] -0.2992144 -0.8601263 -1.6438580 . -0.75064489 0.69066139
## [98,] -0.4538741 -0.0633177 -1.0429181 . 0.69807909 0.87878317
## [99,] -0.9564045 -1.5011987 -1.6785386 . 0.06938024 0.75339878
## [100,] -0.3370789 1.4758302 -1.0024355 . -0.67902229 0.12265753
As these arguments cannot be passed during coercion, we instead provide global variables that can be set or unset to affect the outcome.
path2 <- tempfile()
setTileDBPath(path2)
as(X, "TileDBArray") # uses path2 to store the backend.
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.95942298 0.07138591 0.72378094 . 0.8292767 -0.1295650
## [2,] 0.25769471 0.23607843 0.65032623 . 1.0759468 -1.4793737
## [3,] -1.48594494 0.57356877 0.78875974 . -0.7007660 0.1050982
## [4,] 1.16547374 0.79770396 0.44324183 . 1.8395335 -0.3787531
## [5,] -0.02676217 -1.74428665 -0.30632079 . -0.8642533 2.1463600
## ... . . . . . .
## [96,] -1.0923628 -0.1567279 -0.4464441 . -0.63094079 -0.68922334
## [97,] -0.2992144 -0.8601263 -1.6438580 . -0.75064489 0.69066139
## [98,] -0.4538741 -0.0633177 -1.0429181 . 0.69807909 0.87878317
## [99,] -0.9564045 -1.5011987 -1.6785386 . 0.06938024 0.75339878
## [100,] -0.3370789 1.4758302 -1.0024355 . -0.67902229 0.12265753
sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: aarch64-apple-darwin20
## Running under: macOS Ventura 13.6.7
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
##
## locale:
## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] RcppSpdlog_0.0.18 TileDBArray_1.15.4 DelayedArray_0.31.13
## [4] SparseArray_1.5.41 S4Arrays_1.5.10 IRanges_2.39.2
## [7] abind_1.4-8 S4Vectors_0.43.2 MatrixGenerics_1.17.0
## [10] matrixStats_1.4.1 BiocGenerics_0.51.3 Matrix_1.7-0
## [13] BiocStyle_2.33.1
##
## loaded via a namespace (and not attached):
## [1] bit_4.5.0 jsonlite_1.8.9 compiler_4.4.1
## [4] BiocManager_1.30.25 crayon_1.5.3 Rcpp_1.0.13
## [7] nanoarrow_0.5.0.1 jquerylib_0.1.4 yaml_2.3.10
## [10] fastmap_1.2.0 lattice_0.22-6 R6_2.5.1
## [13] RcppCCTZ_0.2.12 XVector_0.45.0 tiledb_0.30.0
## [16] knitr_1.48 bookdown_0.40 bslib_0.8.0
## [19] rlang_1.1.4 cachem_1.1.0 xfun_0.47
## [22] sass_0.4.9 bit64_4.5.2 cli_3.6.3
## [25] zlibbioc_1.51.1 spdl_0.0.5 digest_0.6.37
## [28] grid_4.4.1 lifecycle_1.0.4 data.table_1.16.0
## [31] evaluate_1.0.0 nanotime_0.3.10 zoo_1.8-12
## [34] rmarkdown_2.28 tools_4.4.1 htmltools_0.5.8.1