Search code examples
rjoblib

open .dat file in R similar to joblib in python


I want to read .dat file in R, data.dat contains two list of lists (a,b) with dimensions as 50*5000*30 and 50*5000*5 respectively. a contains values between 0 and 1024 and b contains values between 0 and 1.

1st Attempt:

#install.packages("devtools")
#devtools::install_github("insysbio/dbs-package")
library("dbs")
file_path = system.file(package = "dbs", "data.dat")
raw_data = read.dat(file_path)
data = import.dat(raw_data)

Error

data = import.dat(raw_data)
Error in x[subset & !is.na(subset), vars, drop = drop] : 
  subscript out of bounds

2nd Attempt:

> read.table("data.dat", fileEncoding="latin1")

Error Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 2 did not have 3 elements

3rd Attempt:

data = scan(file="data.dat", what=list(x="", y="", z=""), flush=TRUE) 

output 3 lists are read but with garbage values

I can open the file in Python by following:

import joblib
a, b = joblib.load("data.dat")

Is there any alternative to joblib in R?


Solution

  • The reticulate package provides an R interface to Python modules, classes, and functions.

    #install.packages("reticulate")
    library(reticulate)
    use_python("/usr/local/bin/python")
    use_virtualenv("myenv")
    use_condaenv("myenv")
    py_config()
    
    #to install the required packages in python file
    run_python_file <- function(python_file){
      a = try(reticulate::py_run_file(python_file),silent=TRUE)
      if(inherits(a,"try-error")& grepl("ModuleNotFoundError",a)){
        system(sprintf("python -m pip install %s",gsub(".* |\\W","",c(a))))
        run_python_file(python_file)
      }
      else a
    }
    data  = run_python_file("readfile.py")
    
    data$a
    data$b
    

    readfile.py

    #!/usr/bin/python
    import joblib
    a, b = joblib.load("data.dat")