Search code examples
c++fstreambinaryfilesstdvector

Why my C++ program is not running correctly when I call these two function together?


So I created this program to write vectors of integer into a binary file, then retrieve the data again back.

//vec.cpp
#include <iostream>
#include <vector>
#include <fstream>

template<typename T>
void writeKey(std::string filename, std::vector<T> arr)
{

    arr.insert(arr.begin(),(T)arr.size());
    
    std::ofstream write_bin(filename, std::ios::out | std::ios::binary);
    if(!write_bin)
    {
        std::cout << "ERROR: writing vector to file!\n";
        exit(1);
    }
    
    for(size_t i = 0; i < arr.size(); i++)
    {
        write_bin.write((char *) &arr[i], sizeof(int));
    }

    write_bin.close();
    if(!write_bin.good())
    {
        std::cout<<"ERROR: writing time error\n";
        exit(1);
    }
}

template<typename T>
std::vector<T> readKey(std::string filename)
{
    std::ifstream read_bin(filename, std::ios::out | std::ios::binary);
    if(!read_bin)
    {
        std::cout<<"ERROR: reading binary file!\n";
        exit(1);
    }

    size_t limit;
    read_bin.read((char*)&limit, sizeof(T));

    std::cout<<"limit : "<<limit<<'\n';

    std::vector<T> arr(limit,0);

    for(size_t i=0; i<limit; ++i)
    {
        read_bin.read((char*)&arr[i], sizeof(T));
    }

    read_bin.close();

    return arr;
}

int main()
{
    
    std::vector<int> mykey = {5,10,15,20};
    writeKey("test.key", mykey);

    std::vector<int> mykeys = readKey<int>("test.key");

    for(auto e: mykeys)
        std::cout<<e<<' ';
    std::cout<<'\n';

    return 0;
}

So you see what I did here is I compile the program that only call the writeKey() function then run it... The program it runs perfectly

int main()
{
    
    std::vector<int> mykey = {5,10,15,20};
    writeKey("test.key", mykey);

    return 0;
}

Then I compiled it again but this time I only call the readkey() function, then I run it, and again it runs as intended

int main()
{
    std::vector<int> mykeys = readKey<int>("test.key");

    for(auto e: mykeys)
        std::cout<<e<<' ';
    std::cout<<'\n';

    return 0;
}

The problem arises the moment I call these two function inside the main() function, then compile and run it here the limit variable in readkey function is having some kind of overflowed value instead of the value that I inserted at the begining of the vector in the writekey function

int main()
{
    
    std::vector<int> mykey = {5,10,15,20};
    writeKey("test.key", mykey);

    std::vector<int> mykeys = readKey<int>("test.key");

    for(auto e: mykeys)
        std::cout<<e<<' ';
    std::cout<<'\n';

    return 0;
}

What is happening here? and how can I fix this?

this is my compilation flags : g++ -o vec.o vec.cpp -Wall -Wextra -fsanitize=address


Solution

  • As already mentioned in the comments by @fabian you are treating your arr.size() as an int (T) when it is actually from type std::size_t. The problem is that std::size_t is 8bytes long (if you are running your program on a 64bit machine) and int only 4bytes.

    First, you are converting your std::size_t to an int in the line arr.insert(arr.begin(),(T)arr.size()); so it "fits" into your vector. You are effectively get rid of half the bytes while doing this. When you then write your vector to the file you wrote 4bytes instead of the needed 8. Now if you read back your values with read_bin.read((char*)&limit, sizeof(T)); you are reading, well, 4bytes into limit, which now has the type std::size_t, i.e. consists of 8bytes again so only the first 4 bytes of your limit are changed the rest is left untouched. Running this code in VS in debug-mode you can examine what happens (bytes that changed are marked red): enter image description here

    Only the first 4 bytes change and the rest is left untouched. Because I ran it in debug-mode the rest is set to cc so the value of the limit becomes 14757395255531667460. This value is too large for the vector and so std::length_error is thrown. If you would run this in release mode you would get undefined behavior because you cannot know what bytes were on the location before the limit variable. Maybe everything runs ok (because all the bytes were 0) but likely you will get a wrong value for the limit.

    To fix this just treat arr.size() as what it is, a std::size_t and don't store it in the first position of your vector (which can only hold T), instead just write it to your file before you write your values and replace sizeof(int) with sizeof(T) (this only works in this example), so your write function becomes:

    template<typename T>
    void writeKey(const std::string& filename, const std::vector<T>& arr) {
    
        std::ofstream write_bin(filename, std::ios::out | std::ios::binary);
        if (!write_bin) {
            std::cout << "ERROR: writing vector to file!\n";
            exit(1);
        }
    
        std::size_t limit = arr.size();
        write_bin.write(reinterpret_cast<char*>(&limit), sizeof(std::size_t));
    
        if (limit != 0)
            write_bin.write(reinterpret_cast<const char*>(&arr[0]), sizeof(T) * limit);
    
        write_bin.close();
        if (!write_bin.good()) {
            std::cout << "ERROR: writing time error\n";
            exit(1);
        }
    }
    

    Now, you only have to read your limit back before you read all the other values so your read function becomes:

    template<typename T>
    std::vector<T> readKey(const std::string& filename) {
    
        std::ifstream read_bin(filename, std::ios::out | std::ios::binary);
        if (!read_bin) {
            std::cout << "ERROR: reading binary file!\n";
            exit(1);
        }
    
        std::size_t limit;
        read_bin.read(reinterpret_cast<char*>(&limit), sizeof(std::size_t));
    
        std::vector<T> arr(limit, 0);
        if (limit != 0)
            read_bin.read(reinterpret_cast<char*>(&arr[0]), sizeof(T) * limit);
    
        read_bin.close();
    
        std::cout << "limit : " << limit << '\n';
    
        return arr;
    }
    

    I changed some other things like passing const references to the functions and removing the for-loops with a single write/read call. Also, see When should static_cast, dynamic_cast, const_cast and reinterpret_cast be used?.