Search code examples
pythonqtcsvpyqtqdatastream

How to open a bin file in Python using QDataStream


I've got a bin file that was encoded in an application that I need to get access to and convert to a csv file. I've been given the documentation, but am not sure how to access the contents of this file in Python.

Here are some of the details about how the dataset was serialized

Datasets.bin is a list of DataSet classes serialized using Qt's QDataStream serialization using version QDataStream::Qt_4_7.

The format of the datasets.bin file is:

quint32 Magic Number    0x46474247
quint32 Version     1
quint32 DataSet Marker  0x44415441
qint32      # of DataSets       n
DataSet DataSet 1
DataSet DataSet 2
     .
     .
     .
     .
DataSet DataSet n


The format of each DataSet is:

quint32     Magic Number    0x53455455  
QString     Name
quint32     Flags           Bit field (Set Table)
QString     Id          [Optional]  
QColor      Color           [Optional]
qint32          Units           [Optional]
QStringList         Creator Ids     [Optional]
bool            Hidden          [Optional]
QList<double>   Thresholds      [Optional]
QString         Source          [Optional]
qint32          Role            [Optional]
QVector<QPointF>    data points

I've been looking in to the PyQt4 datastream documentation, but I can't seem to find any specific examples. Any help pointing me in the right direction would be great


Solution

  • PyQt cannot read all of the data the same way as in C++, because it cannot handle template classes (like QList<double> and QVector<QPointF>), which would require language-specific support that is not available in Python. This means a work-around must be used. Fortunately, the datastream format is quite straightforward, so reading arbitrary template classes can be reduced to a simple algorithm: read the length as a uint32, then iterate over a range and read the contained elements one-by-one into a list:

    points = []
    length = stream.readUInt32()
    for index in range(length):
        point = QPoint()
        stream >> point
        points.append(point)
    

    Below is a script that shows how to read the whole dataset format correctly:

    from PyQt4 import QtCore, QtGui
    
    FLAG_HASSOURCE = 0x0001
    FLAG_HASROLE = 0x0002
    FLAG_HASCOLOR = 0x0004
    FLAG_HASID = 0x0008
    FLAG_COMPRESS = 0x0010
    FLAG_HASTHRESHOLDS = 0x0020
    FLAG_HASUNITS = 0x0040
    FLAG_HASCREATORIDS = 0x0080
    FLAG_HASHIDDEN = 0x0100
    FLAG_HASMETADATA = 0x0200
    
    MAGIC_NUMBER = 0x46474247
    FILE_VERSION = 1
    DATASET_MARKER = 0x44415441
    DATASET_MAGIC = 0x53455455
    
    def read_data(path):
        infile = QtCore.QFile(path)
        if not infile.open(QtCore.QIODevice.ReadOnly):
            raise IOError(infile.errorString())
    
        stream = QtCore.QDataStream(infile)
        magic = stream.readUInt32()
        if magic != MAGIC_NUMBER:
            raise IOError('invalid magic number')
        version = stream.readUInt32()
        if version != FILE_VERSION:
            raise IOError('invalid file version')
        marker = stream.readUInt32()
        if marker != DATASET_MARKER:
            raise IOError('invalid dataset marker')
        count = stream.readInt32()
        if count < 1:
            raise IOError('invalid dataset count')
    
        stream.setVersion(QtCore.QDataStream.Qt_4_7)
    
        rows = []
        while not stream.atEnd():
            row = []
    
            magic = stream.readUInt32()
            if magic != DATASET_MAGIC:
                raise IOError('invalid dataset magic number')
    
            row.append(('Name', stream.readQString()))
    
            flags = stream.readUInt32()
            row.append(('Flags', flags))
    
            if flags & FLAG_HASID:
                row.append(('ID', stream.readQString()))
            if flags & FLAG_HASCOLOR:
                color = QtGui.QColor()
                stream >> color
                row.append(('Color', color))
            if flags & FLAG_HASUNITS:
                row.append(('Units', stream.readInt32()))
            if flags & FLAG_HASCREATORIDS:
                row.append(('Creators', stream.readQStringList()))
            if flags & FLAG_HASHIDDEN:
                row.append(('Hidden', stream.readBool()))
            if flags & FLAG_HASTHRESHOLDS:
                thresholds = []
                length = stream.readUInt32()
                for index in range(length):
                    thresholds.append(stream.readDouble())
                row.append(('Thresholds', thresholds))
            if flags & FLAG_HASSOURCE:
                row.append(('Source', stream.readQString()))
            if flags & FLAG_HASROLE:
                row.append(('Role', stream.readInt32()))
    
            points = []
            length = stream.readUInt32()
            for index in range(length):
                point = QtCore.QPointF()
                stream >> point
                points.append(point)
            row.append(('Points', points))
            rows.append(row)
    
        infile.close()
    
        return rows
    
    rows = read_data('datasets.bin')
    
    for index, row in enumerate(rows):
        print('Row %s:' % index)
        for key, data in row:
            if isinstance(data, list) and len(data):
                print('  %s = [%s ... ] (%s items)' % (
                      key, repr(data[:3])[1:-1], len(data)))
            else:
                print('  %s = %s' % (key, data))