I am using the bigmemory
package to load a heavy dataset, but when I check the size of the object (with function object.size
), it always returns 664 bytes. As far I as understand, the weight should be almost the same as a classic R matrix, but depending of the class (double or integer). Then, why do I obtain 664 bytes as an answer?. Below, reproducible code. The first chunck is really slow, so feel free to reduce the number of simulated values. With (10^6 * 20) will be enough.
# CREATE BIG DATABASE -----------------------------------------------------
data <- as.data.frame(matrix(rnorm(6 * 10^6 * 20), ncol = 20))
write.table(data, file = "big-data.csv", sep = ",", row.names = FALSE)
format(object.size(data), units = "auto")
rm(list = ls())
# BIGMEMORY READ ----------------------------------------------------------
library(bigmemory)
ini <- Sys.time()
data <- read.big.matrix(file = "big-data.csv", header = TRUE, type = "double")
print(Sys.time() - ini)
print(object.size(data), units = "auto")
To determine the size of the bigmemory
matrix use:
> GetMatrixSize(data)
[1] 9.6e+08
Data stored in big.matrix objects can be of type double (8 bytes, the default), integer (4 bytes), short (2 bytes), or char (1 byte).
The reason for the size disparity is that data
stores a pointer to a memory-mapped file. You should be able to find the new file in the temporary directory of your machine. - [Paragraph quoted from R High Performance Programming]
Essentially, bigmatrix maintains a binary data file on the disk called a backing file that holds all of the values in a data set. When values from a bigmatrix object are needed by R, a check is performed to see if they are already in RAM (cached). If they are, then the cached values are returned. If they are not cached, then they are retrieved from the backing file. These caching operations reduce the amount of time needed to access and manipulate the data across separate calls, and they are transparent to the statistician.
See page 8 of the documentation for a description
https://cran.r-project.org/web/packages/bigmemory/bigmemory.pdf
Ref: