Search code examples
c++binaryfilesofstream

How to write a large vector to a file correctly?


I want to save a calculated vector of data to a file so I can load it later, avoiding the need to recalculate it every time the program starts.

The issue is that this works fine with relatively small datasets—for example, a vector with 1 million elements—but when I try using the full dataset of 32 million elements, the program gets stuck. I have to forcibly close it to get my saved file, but it then contains the correct amount of data (2.304 GB, since each element is 72 bytes). So, I don’t know why the program doesn’t continue and i have to forcibly close it to get my file. It just doesn’t reach the outFile.close();.

// Function to save a vector of DataArray to a binary file
void saveDataToFile(const std::vector<KlineDataArray>& data, const std::string& filePath) 
{
    std::ofstream outFile(filePath, std::ios::binary);
    if (!outFile.is_open()) 
    {
        std::cerr << "Failed to open file for writing: " << filePath << '\n';
        return;
    }

    size_t dataSize{ data.size() };
    outFile.write(reinterpret_cast<const char*>(&dataSize), sizeof(size_t));
    outFile.write(reinterpret_cast<const char*>(data.data()), dataSize * sizeof(DataArray));
    outFile.close();
}

Solution

  • The second argument to std::ostream::write is std::streamsize, which ...

    is an implementation-defined signed integral type used to represent the number of characters transferred in an I/O operation or the size of an I/O buffer. It is used as a signed counterpart of std::size_t, similar to the POSIX type ssize_t.

    On a 32-bit platform, that will be equivalent to a int32_t. And 32 million times 72 is just enough to overflow that into negative numbers...

    I suggest you write in chunks of 720 MB or so, or forget that 32-bit exists and just compile as 64-bit.