Search code examples
rrcpp

Is it possible to dynamically load files inside a cppFunction in R?


I'm working on a problem in which I would greatly benefit from being able to load vectors that are saved in disk dynamically inside a loop as this allows me to skip calculating the vectors on the fly (in my actual process one vector is used many times and the collection of vectors as a matrix is too big to have in memory all at once). As a simplified example, lets say that we have the vectors stored in a directory with path prefix (each in its own file). The names of these files are vec0.txt, vec1.txt, vec2.txt, ... etc. We wish to sum all the numbers of all specified vectors in the inclusive range start-end. The size of all vectors is known and is always the same. I thought of something like:

library(Rcpp)
cppFunction('int sumvectors(int start, int end, string prefix, int size) {
    int i;
    int j;
    int arr[size];
    int sum=0;
    for (i=start; i <= end; i++) {
        // Here you would construct the path to the file paste0(prefix, vec, i, ".txt")
        // Then load it and put it into an array
        for (j=0; j <= size; j++) {
            sum+=arr[j];
        }
    }
    return sum;
}')

Is something like this even possible? I'm ok at R but never worked with C or C++ so I don't really even know if this is even doable with Rcpp


Solution

  • Yes, this is certainly possible. If your numbers are written in plain text files separated by spaces like this:


    C://Users/Administrator/vec1.txt

    5.1 21.4 563 -21.2 35.6
    

    C://Users/Administrator/vec2.txt

    3 6 8 7 10 135
    

    Then you can write the following function:

    cppFunction("
    std::vector<float> read_floats(const std::string& path)
    {
      std::vector<float> result;
      
      for(int i = 1; i < 3; ++i)
      {
        std::string file_path = path + std::to_string(i) + \".txt\";
        std::ifstream myfile(file_path.c_str(), std::ios_base::in);
        float a, vec_sum = 0;
        std::vector<float> vec;
        while(myfile >> a)
        {
          vec.push_back(a);
        }
        for(std::vector<float>::iterator it = vec.begin(); it != vec.end(); ++it)
        {
          vec_sum += *it;
        }
        result.push_back(vec_sum);
      }
      return result;
    }", include = c("#include<string>", "#include<fstream>", "#include<vector>"))
    
    

    Which creates an R function that allows you to do this:

    read_floats("c:/Users/Administrator/vec")
    #> [1] 603.9 169.0
    

    Which you can confirm is the sum of the numbers in each file.