Search code examples
c++serializationifstreamofstream

Having trouble serializing binary data using ifstream and ofstream


I am trying to serialize a Plain Old Datastructure using ifstream and ofstream and I wasn't able to get it to work. I then tried to reduce my problem to an ultra basic serialization of just a char and int and even that didn't work. Clearly I'm missing something at a core fundamental level.

For a basic structure:

struct SerializeTestStruct
{
    char mCharVal;
    unsigned int mIntVal;

    void Serialize(std::ofstream& ofs);
};

With serialize function:

void SerializeTestStruct::Serialize(std::ofstream& ofs)
{
    bool isError = (false == ofs.good());
    if (false == isError)
    {
        ofs.write((char*)&mCharVal, sizeof(mCharVal));
        ofs.write((char*)&mIntVal, sizeof(mIntVal));
    }
}

Why would this fail with the following short program?

//ultra basic serialization test.
    SerializeTestStruct* testStruct = new SerializeTestStruct();
    testStruct->mCharVal = 'y';
    testStruct->mIntVal = 9;

    //write
    std::string testFileName = "test.bin";
    std::ofstream fileOut(testFileName.data());
    fileOut.open(testFileName.data(), std::ofstream::binary|std::ofstream::out);
    fileOut.clear();
    testStruct->Serialize(fileOut);

    fileOut.flush();
    fileOut.close();

    delete testStruct;

    //read
    char * memblock;
    std::ifstream fileIn (testFileName.data(), std::ifstream::in|std::ifstream::binary);
    if (fileIn.is_open())
    {
        // get length of file:
        fileIn.seekg (0, std::ifstream::end);
        int length = fileIn.tellg();
        fileIn.seekg (0, std::ifstream::beg);

        // allocate memory:
        memblock = new char [length];
        fileIn.read(memblock, length);
        fileIn.close();

        // read data as a block:
        SerializeTestStruct* testStruct2 = new(memblock) SerializeTestStruct();

        delete[] testStruct2;
    }

When I run through the code I notice that memblock has a "y" at the top so maybe it is working and it's just a problem with the placement new at the very end? After that placement new I end up with a SerializeTestStruct with values: 0, 0.


Solution

  • Here is how your stuff should read:

    #include <fstream>
    #include <string>
    #include <stdexcept>
    
    struct SerializeTestStruct
    {
        char mCharVal;
        unsigned int mIntVal;
    
        void Serialize(::std::ostream &os);
        static SerializeTestStruct Deserialize(::std::istream &is);
    };
    
    void SerializeTestStruct::Serialize(std::ostream &os)
    {
        if (os.good())
        {
            os.write((char*)&mCharVal, sizeof(mCharVal));
            os.write((char*)&mIntVal, sizeof(mIntVal));
        }
    }
    
    SerializeTestStruct SerializeTestStruct::Deserialize(std::istream &is)
    {
            SerializeTestStruct retval;
    
        if (is.good())
        {
            is.read((char*)&retval.mCharVal, sizeof(retval.mCharVal));
            is.read((char*)&retval.mIntVal, sizeof(retval.mIntVal));
        }
        if (is.fail()) {
            throw ::std::runtime_error("failed to read full struct");
        }
        return retval;
    }
    
    int main(int argc, const char *argv[])
    {
    //ultra basic serialization test.
    
        // setup
        const ::std::string testFileName = "test.bin";
    
        // write
        {
            SerializeTestStruct testStruct;
            testStruct.mCharVal = 'y';
            testStruct.mIntVal = 9;
    
            ::std::ofstream fileOut(testFileName.c_str());
            fileOut.open(testFileName.c_str(),
                         std::ofstream::binary|std::ofstream::out);
            fileOut.clear();
            testStruct.Serialize(fileOut);
        }
    
        // read
        {
            ::std::ifstream fileIn (testFileName.c_str(),
                                    std::ifstream::in|std::ifstream::binary);
            if (fileIn.is_open())
            {
                SerializeTestStruct testStruct =            \
                    SerializeTestStruct::Deserialize(fileIn);
    
                ::std::cout << "testStruct.mCharVal == '" << testStruct.mCharVal
                            << "' && testStruct.mIntVal == " << testStruct.mIntVal
                            << '\n';
            }
        }
        return 0;
    }
    

    Style issues:

    • Don't use new to create things if you can help it. Stack allocated objects are usually what you want and significantly easier to manage than the arbitrary lifetime objects you allocate from the heap. If you do use new, consider using a smart pointer type of some kind to help manage the lifetime for you.
    • Serialization and deserialization code should be matched up so that they can be examined and altered together. This makes maintenance of such code much easier.
    • Rely on C++ to clean things up for you with destructors, that's what they're for. This means making basic blocks containing parts of your code if it the scopes of the variables used is relatively confined.
    • Don't needlessly use flags.

    Mistakes...

    • Don't use the data member function of ::std::string.
    • Using placement new and that memory block thing is really bad idea because it's ridiculously complex. And if you did use it, then you do not use array delete in the way you did. And lastly, it won't work anyway for a reason explained later.
    • Do not use ofstream in the type taken by your Serialize function as it is a derived class who's features you don't need. You should always use the most base class in a hierarchy that has the features you need unless you have a very specific reason not to. Serialize works fine with the features of the base ostream class, so use that type instead.
    • The on-disk layout of your structure and the in memory layout do not match, so your placement new technique is doomed to fail. As a rule, if you have a serialize function, you need a matching deserialize function.

    Here is a further explanation of your memory layout issue. The structure, in memory, on an x86_64 based Linux box looks like this:

    +------------+-----------+
    |Byte number | contents  |
    +============+===========+
    |          0 |     0x79  |
    |            | (aka 'y') |
    +------------+-----------+
    |          1 |   padding |
    +------------+-----------+
    |          3 |   padding |
    +------------+-----------+
    |          4 |   padding |
    +------------+-----------+
    |          5 |         9 |
    +------------+-----------+
    |          6 |         0 |
    +------------+-----------+
    |          7 |         0 |
    +------------+-----------+
    |          8 |         0 |
    +------------+-----------+
    

    The contents of the padding section are undefined, but generally 0. It doesn't matter though because that space is never used and merely exists so that access to the following int lies on an efficient 4-byte boundary.

    The size of your structure on disk is 5 bytes, and is completely missing the padding sections. So that means when you read it into memory it won't line up properly with the in memory structure at all and accessing it is likely to cause some kind of horrible problem.

    The first rule, if you need a serialize function, you need a deserialize function. Second rule, unless you really know exactly what you are doing, do not dump raw memory into a file. This will work just fine in many cases, but there are important cases in which it won't work. And unless you are aware of what does and doesn't work, and when it does or doesn't work, you will end up code that seems to work OK in certain test situations, but fails miserable when you try to use it in a real system.

    My code still does dump memory into a file. And it should work as long as you read the result back on exactly the same architecture and platform with code compiled with the same version of the compiler as when you wrote it. As soon as one of those variables changes, all bets are off.