Search code examples
pythongpsbinary-data

How to read and extract data from a binary data file with multiple variable-length records?


Using Python (3.1 or 2.6), I'm trying to read data from binary data files produced by a GPS receiver. Data for each hour is stored in a separate file, each of which is about 18 MiB. The data files have multiple variable-length records, but for now I need to extract data from just one of the records.

I've got as far as being able to decode, somewhat, the header. I say somewhat because some of the numbers don't make sense, but most do. After spending a few days on this (I've started learning to program using Python), I'm not making progress, so it's time to ask for help.

The reference guide gives me the message header structure and the record structure. Headers can be variable length but are usually 28 bytes.

Header
Field #  Field Name    Field Type    Desc                 Bytes    Offset
1        Sync          char          Hex 0xAA             1        0
2        Sync          char          Hex 0x44             1        1
3        Sync          char          Hex 0x12             1        2
4        Header Lgth   uchar         Length of header     1        3
5        Message ID    ushort        Message ID of log    2        4
8        Message Lgth  ushort        length of message    2        8
11       Time Status   enum          Quality of GPS time  1        13
12       Week          ushort        GPS week number      2        14
13       Milliseconds  GPSec         Time in ms           4        16


Record
Field #  Data                        Bytes         Format     Units       Offset
1        Header                                                           0
2        Number of SV Observations   4             integer    n/a         H
         *For first SV Observation*  
3        PRN                         4             integer    n/a         H+4
4        SV Azimuth angle            4             float      degrees     H+8
5        SV Elevation angle          4             float      degrees     H+12
6        C/N0                        8             double     db-Hz       H+16
7        Total S4                    8             double     n/a         H+24
...
27       L2 C/N0                     8             double     db-Hz       H+148
28       *For next SV Observation*
         SV Observation is satellite - there could be anywhere from 8 to 13 
         in view.

Here's my code for trying to make sense of the header:

import struct

filename = "100301_110000.nvd"

f = open(filename, "rb")
s = f.read(28)
x, y, z, lgth, msg_id, mtype, port, mlgth, seq, idletime, timestatus, week, millis,    recstatus, reserved, version = struct.unpack("<cccBHcBHHBcHLLHH", s)

print(x, y, z, lgth, msg_id, mtype, port, mlgth, seq, idletime, timestatus, week, millis, recstatus, reserved, version)

It outputs:

b'\xaa' b'D' b'\x12' 28 274 b'\x02' 32 1524 0 78 b'\xa0' 1573 126060000 10485760 3545 35358

The 3 sync fields should return xAA x44 x12. (D is the ascii equiv of x44 - I assume.)

The record ID for which I'm looking is 274 - that seems correct.

GPS week is returned as 1573 - that seems correct.

Milliseconds is returned as 126060000 - I was expecting 126015000.

How do I go about finding the records identified as 274 and extracting them? (I'm learning Python, and programming, so keep in mind the answer you give an experienced coder might be over my head.)


Solution

  • 18 MB should fit comfortably in memory, so I'd just gulp the whole thing into one big string of bytes with a single with open(thefile, 'rb') as f: data = f.read() and then perform all the "parsing" on slices to advance record by record. It's more convenient, and may well be faster than doing many small reads from here and there in the file (though it doesn't affect the logic below, because in either case the "current point of interest in the data" is always moving [[always forward, as it happens]] by amounts computed based on the struct-unpacking of a few bytes at a time, to find the lengths of headers and records).

    Given the "start of a record" offset, you can determine its header's length by looking at just one byte ("field four", offset 3 from start of header that's the same as start of record) and look at message ID (next field, 2 bytes) to see if it's the record you care about (so a struct unpack of just those 3 bytes should suffice for that).

    Whether it's the record you want or not, you next need to compute the record's length (either to skip it or to get it all); for that, you compute the start of the actual record data (start of record plus length of header plus the next field of the record (the 4 bytes right after the header) times the length of an observation (32 bytes if I read you correctly).

    This way you either isolate the substring to be given to struct.unpack (when you've finally reached the record you want), or just add the total length of header + record to the "start of record" offset, to get the offset for the start of the next record.