Search code examples
c++qtbinary-data

Understanding the binary format of QDataStream


I am trying to decode a binary file that was written by CloudCompare using a small Kotlin tool I am writing. I figured that since CloudCompare is open source, this should be fairly simple, but I am struggling with one particular issue.

I am trying to decode the name of a CloudCompare entity and CloudCompare uses the following C++ code to decode it:

// Header file
QString m_name;

// Source file
bool ccObject::fromFile(QFile& in, short dataVersion, int flags, LoadedIDMap& oldToNewIDMap)
{
    // ...

    //name
    if (dataVersion < 22) //old style
    {
        // ...
    }
    else //(dataVersion>=22)
    {
        QDataStream inStream(&in);
        inStream >> m_name;
    }

    // ...
}

So, in essence, CloudCompare wraps the binary input stream in QDataStream and asks QDataStream to decode a QString. The documentation of the binary format that QDataStream uses specifies, that QStrings are encoded with their length encoded as a quint32 followed by the string encoded in UTF-16. However, the data that I am trying to decode does not seem to match up with that. Here's the data in question:

Binary data to be decoded

When trying to decode the name, the seek position of the input stream is at 0x0010. If I follow the docs, I would parse the following for bytes (0x77 0x04 0x00 0x00) with big-endian byte order to an unsigned int (which evaluates to 1996750848) and expect the actual UTF-16 string to start at offset 0x0014). But as you can see, the string only starts at offset 0x0018 and and if you decode the bytes from 0x0014 to 0x0017, they indeed decode to 14, the length of the UTF-16 string in bytes. Hence, there seem to be 4 bytes from offset 0x0010 to 0x0013 which I don't understand the purpose of.

My first guess is that QDataStream encodes some header information there, but I couldn't find any documentation on what that could be. Hence, my question boils down to what these four bytes mean and whether I need to parse them at all or if skipping them is possible without any side-issues.


Solution

  • Tracing through the CloudCompare source code tells me the following about the file format:

    • 0x00+3 43 43 42 (CCB) specify that a BinFilter should be used to read
    • 0x03+1 32 are "load flags", whatever those may be.
    • 0x04+4 34 00 00 00 is the file version. 0x34 = 52. The latest version is 54 apparently.
    • 0x08+8 01 00 00 00 00 00 00 00 is the class ID of the object being read. 0x1 is a "HObject".
    • Control transfers to ccHObject::fromFile. First thing it does is call ccHObject::fromFileNoChildren, which calls ccObject::fromFile.
    • 0x10+4 77 04 00 00 is the "unique ID of this object", as of data version 20. This is the bit you overlooked.
    • 0x14 is the start of a serialized QString, as you showed in your question.