Search code examples
c++loadingstdvector

Faster way of loading (big) std::vector<std::vector<float>> from file


I have implemented a way to save a std::vector of vectors to file and read them using this code (found here on stackoverflow):

Saving:

void saveData(std::string path)
{
    std::ofstream FILE(path, std::ios::out | std::ofstream::binary);

    // Store size of the outer vector
    int s1 = RecordData.size();
    FILE.write(reinterpret_cast<const char*>(&s1), sizeof(s1));

    // Now write each vector one by one
    for (auto& v : RecordData) {
        // Store its size
        int size = v.size();
        FILE.write(reinterpret_cast<const char*>(&size), sizeof(size));

        // Store its contents
        FILE.write(reinterpret_cast<const char*>(&v[0]), v.size() * sizeof(float));
    }
    FILE.close();
}

Reading:

void loadData(std::string path)
{
    std::ifstream FILE(path, std::ios::in | std::ifstream::binary);

    if (RecordData.size() > 0) // Clear data
    {
        for (int n = 0; n < RecordData.size(); n++)
            RecordData[n].clear();
        RecordData.clear();
    }

    int size = 0;
    FILE.read(reinterpret_cast<char*>(&size), sizeof(size));
    RecordData.resize(size);
    for (int n = 0; n < size; ++n) {
        int size2 = 0;
        FILE.read(reinterpret_cast<char*>(&size2), sizeof(size2));
        float f;
        //RecordData[n].reserve(size2); // This doesn't make a difference in speed
        for (int k = 0; k < size2; ++k) {
            FILE.read(reinterpret_cast<char*>(&f), sizeof(f));
            RecordData[n].push_back(f);
        }
    }
}

This works perfectly, but loading for a big dataset (980MB, size 32000 for inner vectors and 1600 of those) takes ~7-8 seconds (in contrast to saving, which is done in under 1 sec.). Since I can see memory-usage in Visual Studio going up slowly during loading, my guess would be a lot of memory allocations. The commented out line RecordData[n].resize(size2); doesn't make a difference, though.

Can anybody give me a faster way of loading this kind of data? My first try was putting all the data in one big std::vector<float> but that for some reason seemed to give some kind of overflow (which shouldn't happen, because sizeof(int) = 4, so ~4 billion, should be enough for an index variable (does std::vector use somehing else internally?)). Also it would be really nice to have a data-structure of std::vector<std::vector<float>>. In the future I will have to handle way bigger datasets (altough I will probably use <short> for that to save memory and handle it as a fixed-point-number), so loading-speeds will be more significant...

Edit:

I should point out, that 32000 for the inner vector and 1600 for the outer vector is just an example. Both can vary. I think, I would have to save an "index-vector" as the first inner vector to declare the number of items for the rest (like I said in a comment: I'm a first-time file-reader/-writer and haven't used std::vector for more than I week or two, so I'm not sure about that). I will look into block-reading and post the result in a later edit...

Edit2:

So, here is the version of perivesta (thank you for that). The only change I made is discarding RV& RecordData because this is a global variable for me.

Curiously this brings my loading time down only from ~7000ms to ~1500ms for a 980 GB file, not 7429ms to 644 ms for a 2 GB file for perivesta (strange, how different speeds differ on different systems ;-) )

void loadData2(std::string path)
{
    std::ifstream FILE(path, std::ios::in | std::ifstream::binary);

    if (RecordData.size() > 0) // Clear data
    {
        for (int n = 0; n < RecordData.size(); n++)
            RecordData[n].clear();
        RecordData.clear();
    }

    int size = 0;
    FILE.read(reinterpret_cast<char*>(&size), sizeof(size));
    RecordData.resize(size);
    for (auto& v : RecordData) {
        // load its size
        int size2 = 0;
        FILE.read(reinterpret_cast<char*>(&size2), sizeof(size2));
        v.resize(size2);

        // load its contents
        FILE.read(reinterpret_cast<char*>(&v[0]), v.size() * sizeof(float));
    }
}

Solution

  • This is an implementation of Alan Birtles' comment: When reading, read an inner vector with one single FILE.read call instead of many individual ones. This reduces the time dramatically on my system:

    These are the results for a 2GB file:

    Writing    took 2283 ms
    Reading v1 took 7429 ms
    Reading v2 took 644 ms
    

    Here is the code that produces this output:

    #include <vector>
    #include <iostream>
    #include <string>
    #include <chrono>
    #include <random>
    #include <fstream>
    
    using RV = std::vector<std::vector<float>>;
    
    void saveData(std::string path, const RV& RecordData)
    {
        std::ofstream FILE(path, std::ios::out | std::ofstream::binary);
    
        // Store size of the outer vector
        int s1 = RecordData.size();
        FILE.write(reinterpret_cast<const char*>(&s1), sizeof(s1));
    
        // Now write each vector one by one
        for (auto& v : RecordData) {
            // Store its size
            int size = v.size();
            FILE.write(reinterpret_cast<const char*>(&size), sizeof(size));
    
            // Store its contents
            FILE.write(reinterpret_cast<const char*>(&v[0]), v.size() * sizeof(float));
        }
        FILE.close();
    }
    
    //original version for comparison
    void loadData1(std::string path, RV& RecordData)
    {
        std::ifstream FILE(path, std::ios::in | std::ifstream::binary);
    
        if (RecordData.size() > 0) // Clear data
        {
            for (int n = 0; n < RecordData.size(); n++)
                RecordData[n].clear();
            RecordData.clear();
        }
    
        int size = 0;
        FILE.read(reinterpret_cast<char*>(&size), sizeof(size));
        RecordData.resize(size);
        for (int n = 0; n < size; ++n) {
            int size2 = 0;
            FILE.read(reinterpret_cast<char*>(&size2), sizeof(size2));
            float f;
            //RecordData[n].resize(size2); // This doesn't make a difference in speed
            for (int k = 0; k < size2; ++k) {
                FILE.read(reinterpret_cast<char*>(&f), sizeof(f));
                RecordData[n].push_back(f);
            }
        }
    }
    
    //my version
    void loadData2(std::string path, RV& RecordData)
    {
        std::ifstream FILE(path, std::ios::in | std::ifstream::binary);
    
        if (RecordData.size() > 0) // Clear data
        {
            for (int n = 0; n < RecordData.size(); n++)
                RecordData[n].clear();
            RecordData.clear();
        }
    
        int size = 0;
        FILE.read(reinterpret_cast<char*>(&size), sizeof(size));
        RecordData.resize(size);
        for (auto& v : RecordData) {
            // load its size
            int size2 = 0;
            FILE.read(reinterpret_cast<char*>(&size2), sizeof(size2));
            v.resize(size2);
    
            // load its contents
            FILE.read(reinterpret_cast<char*>(&v[0]), v.size() * sizeof(float));
        }
    }
    
    int main()
    {
        using namespace std::chrono;
        const std::string filepath = "./vecdata";
        const std::size_t sizeOuter = 16000;
        const std::size_t sizeInner = 32000;
        RV vecSource;
        RV vecLoad1;
        RV vecLoad2;
    
        const auto tGen1 = steady_clock::now();
        std::cout << "generating random numbers..." << std::flush;
        std::random_device dev;
        std::mt19937 rng(dev());
        std::uniform_real_distribution<float> dis;
        for(int i = 0; i < sizeOuter; ++i)
        {
            RV::value_type inner;
            for(int k = 0; k < sizeInner; ++k)
            {
                inner.push_back(dis(rng));
            }
            vecSource.push_back(inner);
        }
        const auto tGen2 = steady_clock::now();
    
        std::cout << "done\nSaving..." << std::flush;
        const auto tSave1 = steady_clock::now();
        saveData(filepath, vecSource);
        const auto tSave2 = steady_clock::now();
    
        std::cout << "done\nReading v1..." << std::flush;
        const auto tLoadA1 = steady_clock::now();
        loadData1(filepath, vecLoad1);
        const auto tLoadA2 = steady_clock::now();
        std::cout << "verifying..." << std::flush;
        if(vecSource != vecLoad1) std::cout << "FAILED! ...";
    
        std::cout << "done\nReading v2..." << std::flush;
        const auto tLoadB1 = steady_clock::now();
        loadData2(filepath, vecLoad2);
        const auto tLoadB2 = steady_clock::now();
        std::cout << "verifying..." << std::flush;
        if(vecSource != vecLoad2) std::cout << "FAILED! ...";
    
    
        std::cout << "done\nResults:\n" <<
            "Generating took " << duration_cast<milliseconds>(tGen2 - tGen1).count() << " ms\n" <<
            "Writing    took " << duration_cast<milliseconds>(tSave2 - tSave1).count() << " ms\n" <<
            "Reading v1 took " << duration_cast<milliseconds>(tLoadA2 - tLoadA1).count() << " ms\n" <<
            "Reading v2 took " << duration_cast<milliseconds>(tLoadB2 - tLoadB1).count() << " ms\n" <<
            std::flush;
    }