Search code examples
c++multidimensional-arrayeigenmemory-mapped-fileseigen3

Eigen and huge dense 2D arrays


I am using 2D Eigen::Arrays for a project, and I like to keep using them in the case of huge 2D arrays.

For avoiding memory issues, I thought to use memory mapped files to manage (read/modify/write) these arrays, but I cannot find working examples.

The closest example that I have found is this based on boost::interprocess, but it uses shared-memory (while I'd prefer to have persistent storage).

The lack of examples makes me worry if there is a better, main-stream alternative solution to my problem. Is this the case? A minimal example would be very handy.

EDIT:

This is a minimal example explaining my use case in the comments:

#include <Eigen/Dense>


int main()
{
    // Order of magnitude of the required arrays
    Eigen::Index rows = 50000;
    Eigen::Index cols = 40000;

    {
        // Array creation (this is where the memory mapped file should be created)
        Eigen::ArrayXXf arr1 = Eigen::ArrayXXf::Zero( rows, cols );

        // Some operations on the array
        for(Eigen::Index i = 0; i < rows; ++i)
        {
            for(Eigen::Index j = 0; j < cols; ++j)
            {
                arr1( i, j ) = float(i * j);
            }
        }

        // The array goes out of scope, but the data are persistently stored in the file
    }

    {
        // This should actually use the data stored in the file
        Eigen::ArrayXXf arr2 = Eigen::ArrayXXf::Zero( rows, cols );

        // Manipulation of the array data
        for(Eigen::Index i = 0; i < rows; ++i)
        {
            for(Eigen::Index j = 0; j < cols; ++j)
            {
                arr2( i, j ) += 1.0f;
            }
        }

        // The array goes out of scope, but the data are persistently stored in the file
    }

}

Solution

  • Based on this comment and these answers (https://stackoverflow.com/a/51256963/2741329 and https://stackoverflow.com/a/51256597/2741329), this is my working solution:

    #include <boost/interprocess/file_mapping.hpp>
    #include <boost/interprocess/mapped_region.hpp>
    #include <Eigen/Dense>
    #include <iostream>
    #include <fstream>
    #include <filesystem>
    
    namespace fs = std::experimental::filesystem;
    namespace bi = boost::interprocess;
    
    int main() {
    
      std::string array_bin_path = "array.bin";
      const int64_t nr_rows = 28000;
      const int64_t nr_cols = 35000;
      const int64_t array_size = nr_rows * nr_cols * sizeof(float);
      std::cout << "array size: " << array_size << std::endl;
    
      // if the file already exists but the size is different, remove it
      if(fs::exists(array_bin_path))
      {
        int64_t file_size = fs::file_size(array_bin_path);
        std::cout << "file size: " << file_size << std::endl;
        if(array_size != file_size)
        {
          fs::remove(array_bin_path);
        }
      }
    
      // create a binary file of the required size
      if(!fs::exists(array_bin_path))
      {
        std::ofstream ofs(array_bin_path, std::ios::binary | std::ios::out | std::ios::trunc);
        ofs.seekp(array_size - 1);
        ofs.put(0);
        ofs.close();
      }
    
      // use boost interprocess to memory map the file
      const bi::file_mapping mapped_file(array_bin_path.c_str(), bi::read_write);
      bi::mapped_region region(mapped_file, bi::read_write);
    
      // get the address of the mapped region
      void * addr = region.get_address();
    
      const std::size_t region_size = region.get_size();
      std::cout << "region size: " << region_size << std::endl;
    
      // map the file content into a Eigen array
      Eigen::Map<Eigen::ArrayXXf> my_array(reinterpret_cast<float*>(addr), nr_rows, nr_cols);
    
      // modify the content
      std::cout << "initial array(0, 1) value: " << my_array(0, 1) << std::endl;
      my_array(0, 1) += 1.234f;
      std::cout << "final array(0, 1) value: " << my_array(0, 1) << std::endl;
    
      return 0;
    }
    

    It uses:

    • boost::interprocess in place of boost::iostreams because it is header-only. In addition, mapped_region is handy in case that I want to store multiple arrays on a single mapped file.
    • std::fstream to create the binary file and std::experimental::filesystem to check it.
    • Eigen::ArrayXXf as required in my question.