I am trying to serialize a Plain Old Datastructure using ifstream and ofstream and I wasn't able to get it to work. I then tried to reduce my problem to an ultra basic serialization of just a char and int and even that didn't work. Clearly I'm missing something at a core fundamental level.
For a basic structure:
struct SerializeTestStruct
{
char mCharVal;
unsigned int mIntVal;
void Serialize(std::ofstream& ofs);
};
With serialize function:
void SerializeTestStruct::Serialize(std::ofstream& ofs)
{
bool isError = (false == ofs.good());
if (false == isError)
{
ofs.write((char*)&mCharVal, sizeof(mCharVal));
ofs.write((char*)&mIntVal, sizeof(mIntVal));
}
}
Why would this fail with the following short program?
//ultra basic serialization test.
SerializeTestStruct* testStruct = new SerializeTestStruct();
testStruct->mCharVal = 'y';
testStruct->mIntVal = 9;
//write
std::string testFileName = "test.bin";
std::ofstream fileOut(testFileName.data());
fileOut.open(testFileName.data(), std::ofstream::binary|std::ofstream::out);
fileOut.clear();
testStruct->Serialize(fileOut);
fileOut.flush();
fileOut.close();
delete testStruct;
//read
char * memblock;
std::ifstream fileIn (testFileName.data(), std::ifstream::in|std::ifstream::binary);
if (fileIn.is_open())
{
// get length of file:
fileIn.seekg (0, std::ifstream::end);
int length = fileIn.tellg();
fileIn.seekg (0, std::ifstream::beg);
// allocate memory:
memblock = new char [length];
fileIn.read(memblock, length);
fileIn.close();
// read data as a block:
SerializeTestStruct* testStruct2 = new(memblock) SerializeTestStruct();
delete[] testStruct2;
}
When I run through the code I notice that memblock
has a "y" at the top so maybe it is working and it's just a problem with the placement new
at the very end? After that placement new I end up with a SerializeTestStruct
with values: 0, 0.
Here is how your stuff should read:
#include <fstream>
#include <string>
#include <stdexcept>
struct SerializeTestStruct
{
char mCharVal;
unsigned int mIntVal;
void Serialize(::std::ostream &os);
static SerializeTestStruct Deserialize(::std::istream &is);
};
void SerializeTestStruct::Serialize(std::ostream &os)
{
if (os.good())
{
os.write((char*)&mCharVal, sizeof(mCharVal));
os.write((char*)&mIntVal, sizeof(mIntVal));
}
}
SerializeTestStruct SerializeTestStruct::Deserialize(std::istream &is)
{
SerializeTestStruct retval;
if (is.good())
{
is.read((char*)&retval.mCharVal, sizeof(retval.mCharVal));
is.read((char*)&retval.mIntVal, sizeof(retval.mIntVal));
}
if (is.fail()) {
throw ::std::runtime_error("failed to read full struct");
}
return retval;
}
int main(int argc, const char *argv[])
{
//ultra basic serialization test.
// setup
const ::std::string testFileName = "test.bin";
// write
{
SerializeTestStruct testStruct;
testStruct.mCharVal = 'y';
testStruct.mIntVal = 9;
::std::ofstream fileOut(testFileName.c_str());
fileOut.open(testFileName.c_str(),
std::ofstream::binary|std::ofstream::out);
fileOut.clear();
testStruct.Serialize(fileOut);
}
// read
{
::std::ifstream fileIn (testFileName.c_str(),
std::ifstream::in|std::ifstream::binary);
if (fileIn.is_open())
{
SerializeTestStruct testStruct = \
SerializeTestStruct::Deserialize(fileIn);
::std::cout << "testStruct.mCharVal == '" << testStruct.mCharVal
<< "' && testStruct.mIntVal == " << testStruct.mIntVal
<< '\n';
}
}
return 0;
}
Style issues:
new
to create things if you can help it. Stack allocated objects are usually what you want and significantly easier to manage than the arbitrary lifetime objects you allocate from the heap. If you do use new
, consider using a smart pointer type of some kind to help manage the lifetime for you.Mistakes...
data
member function of ::std::string
.new
and that memory block thing is really bad idea because it's ridiculously complex. And if you did use it, then you do not use array delete in the way you did. And lastly, it won't work anyway for a reason explained later.ofstream
in the type taken by your Serialize
function as it is a derived class who's features you don't need. You should always use the most base class in a hierarchy that has the features you need unless you have a very specific reason not to. Serialize
works fine with the features of the base ostream
class, so use that type instead.serialize
function, you need a matching deserialize
function.Here is a further explanation of your memory layout issue. The structure, in memory, on an x86_64 based Linux box looks like this:
+------------+-----------+
|Byte number | contents |
+============+===========+
| 0 | 0x79 |
| | (aka 'y') |
+------------+-----------+
| 1 | padding |
+------------+-----------+
| 3 | padding |
+------------+-----------+
| 4 | padding |
+------------+-----------+
| 5 | 9 |
+------------+-----------+
| 6 | 0 |
+------------+-----------+
| 7 | 0 |
+------------+-----------+
| 8 | 0 |
+------------+-----------+
The contents of the padding
section are undefined, but generally 0
. It doesn't matter though because that space is never used and merely exists so that access to the following int
lies on an efficient 4-byte boundary.
The size of your structure on disk is 5 bytes, and is completely missing the padding sections. So that means when you read it into memory it won't line up properly with the in memory structure at all and accessing it is likely to cause some kind of horrible problem.
The first rule, if you need a serialize
function, you need a deserialize
function. Second rule, unless you really know exactly what you are doing, do not dump raw memory into a file. This will work just fine in many cases, but there are important cases in which it won't work. And unless you are aware of what does and doesn't work, and when it does or doesn't work, you will end up code that seems to work OK in certain test situations, but fails miserable when you try to use it in a real system.
My code still does dump memory into a file. And it should work as long as you read the result back on exactly the same architecture and platform with code compiled with the same version of the compiler as when you wrote it. As soon as one of those variables changes, all bets are off.