Search code examples
c#pythonbinary

python: c# binary datetime encoding


I need to extract financial price data from a binary file. This price data is normally extracted by a piece of C# code. The biggest problem I'm having is getting a meaningful datetime.

The binary data looks like this:

'\x14\x11\x00\x00{\x14\xaeG\xe1z(@\x9a\x99\x99\x99\x99\x99(@q=\n\xd7\xa3p(@\x9a\x99\x99\x99\x99\x99(@\xac\x00\x19\x00\x00\x00\x00\x00\x08\x01\x00\x00\x00"\xd8\x18\xe0\xdc\xcc\x08'

The C# code that extracts it correctly is:

StockID = reader.ReadInt32();
Open = reader.ReadDouble();
High = reader.ReadDouble();
Low = reader.ReadDouble();
Close = reader.ReadDouble();
Volume = reader.ReadInt64();
TotalTrades = reader.ReadInt32();
Timestamp = reader.ReadDateTime();

This is where I've gotten in python. I have a couple concerns about it.

In [1]: barlength = 56; barformat = 'i4dqiq'
In [2]: pricebar = f.read(barlength)
In [3]: pricebar
Out[3]: '\x95L\x00\x00)\\\x8f\xc2\xf5\xc8N@D\x1c\xeb\xe26\xcaN@\x7fj\xbct\x93\xb0N@\xd7\xa3p=\n\xb7N@\xf6\xdb\x02\x00\x00\x00\x00\x00J\x03\x00\x00\x00"\xd8\x18\xe0\xdc\xcc\x08'
In [4]: struct.unpack(barformat, pricebar)
Out[4]: 
(19605,                # stock id
 61.57,                # open
 61.579800000000006,   # high
 61.3795,              # low
 61.43,                # close
 187382,               # volume -- seems reasonable
 842,                  # TotalTrades -- seems reasonable
 634124502600000000L   # datetime -- no idea what this means!
)

I used python's built in struct module but have some concerns about it.

  1. I'm not sure what format characters correspond to Int32 vs Int64 in the C# code, though several different tries returned the same python tuple.

  2. I'm concerned though since the output for some of the fields doesn't seem to be very sensitive to the format I specify: For example, the TotalTrades field returns the same amount if i specify it as either signed or unsigned int OR signed or unsigned long (l, L, i, or I)

  3. I can't make any sense of the date return field. This is actually my biggest problem.


Solution

  • As far as I know, .net timestamps are ticks (stored as a 62-bit value with the upper 2 bits if the timestamp is UTC or Local) since 0001-01-01T00:00:00Z where a tick is 100 nanoseconds. So:

    >>> x = 634124502600000000
    >>> x = x & 0x3FFFFFFFFFFFFFFF
    >>> secs = x / 10.0 ** 7
    >>> secs
    63412450260.0
    >>> import datetime
    >>> delta = datetime.timedelta(seconds=secs)
    >>> delta
    datetime.timedelta(733940, 34260)
    >>> ts = datetime.datetime(1,1,1) + delta
    >>> ts
    datetime.datetime(2010, 6, 18, 9, 31)
    >>>
    

    The date part is 2010-06-18. Are you in a timezone that's 9.5 hours away from UTC? It would be rather useful in verifying this calculation if you were to supply TWO timestamp values together with the expected answers.

    Addressing your concern """I'm concerned though since the output for some of the fields doesn't seem to be very sensitive to the format I specify: For example, the TotalTrades field returns the same amount if i specify it as either signed or unsigned int OR signed or unsigned long (l, L, i, or I)""": They are not sensitive because (1) "long" and "int" mean the same (32 bits) and (2) the smaller half of all possible unsigned numbers have the same representation as signed numbers. For example, in 8-bit numbers, the numbers 0 to 127 inclusive have the same bit pattern whether signed or unsigned.