I am using 2D Eigen::Array
s for a project, and I like to keep using them in the case of huge 2D arrays.
For avoiding memory issues, I thought to use memory mapped files to manage (read/modify/write) these arrays, but I cannot find working examples.
The closest example that I have found is this based on boost::interprocess
, but it uses shared-memory (while I'd prefer to have persistent storage).
The lack of examples makes me worry if there is a better, main-stream alternative solution to my problem. Is this the case? A minimal example would be very handy.
EDIT:
This is a minimal example explaining my use case in the comments:
#include <Eigen/Dense>
int main()
{
// Order of magnitude of the required arrays
Eigen::Index rows = 50000;
Eigen::Index cols = 40000;
{
// Array creation (this is where the memory mapped file should be created)
Eigen::ArrayXXf arr1 = Eigen::ArrayXXf::Zero( rows, cols );
// Some operations on the array
for(Eigen::Index i = 0; i < rows; ++i)
{
for(Eigen::Index j = 0; j < cols; ++j)
{
arr1( i, j ) = float(i * j);
}
}
// The array goes out of scope, but the data are persistently stored in the file
}
{
// This should actually use the data stored in the file
Eigen::ArrayXXf arr2 = Eigen::ArrayXXf::Zero( rows, cols );
// Manipulation of the array data
for(Eigen::Index i = 0; i < rows; ++i)
{
for(Eigen::Index j = 0; j < cols; ++j)
{
arr2( i, j ) += 1.0f;
}
}
// The array goes out of scope, but the data are persistently stored in the file
}
}
Based on this comment and these answers (https://stackoverflow.com/a/51256963/2741329 and https://stackoverflow.com/a/51256597/2741329), this is my working solution:
#include <boost/interprocess/file_mapping.hpp>
#include <boost/interprocess/mapped_region.hpp>
#include <Eigen/Dense>
#include <iostream>
#include <fstream>
#include <filesystem>
namespace fs = std::experimental::filesystem;
namespace bi = boost::interprocess;
int main() {
std::string array_bin_path = "array.bin";
const int64_t nr_rows = 28000;
const int64_t nr_cols = 35000;
const int64_t array_size = nr_rows * nr_cols * sizeof(float);
std::cout << "array size: " << array_size << std::endl;
// if the file already exists but the size is different, remove it
if(fs::exists(array_bin_path))
{
int64_t file_size = fs::file_size(array_bin_path);
std::cout << "file size: " << file_size << std::endl;
if(array_size != file_size)
{
fs::remove(array_bin_path);
}
}
// create a binary file of the required size
if(!fs::exists(array_bin_path))
{
std::ofstream ofs(array_bin_path, std::ios::binary | std::ios::out | std::ios::trunc);
ofs.seekp(array_size - 1);
ofs.put(0);
ofs.close();
}
// use boost interprocess to memory map the file
const bi::file_mapping mapped_file(array_bin_path.c_str(), bi::read_write);
bi::mapped_region region(mapped_file, bi::read_write);
// get the address of the mapped region
void * addr = region.get_address();
const std::size_t region_size = region.get_size();
std::cout << "region size: " << region_size << std::endl;
// map the file content into a Eigen array
Eigen::Map<Eigen::ArrayXXf> my_array(reinterpret_cast<float*>(addr), nr_rows, nr_cols);
// modify the content
std::cout << "initial array(0, 1) value: " << my_array(0, 1) << std::endl;
my_array(0, 1) += 1.234f;
std::cout << "final array(0, 1) value: " << my_array(0, 1) << std::endl;
return 0;
}
It uses:
boost::interprocess
in place of boost::iostreams
because it is header-only. In addition, mapped_region
is handy in case that I want to store multiple arrays on a single mapped file. std::fstream
to create the binary file and std::experimental::filesystem
to check it.Eigen::ArrayXXf
as required in my question.