Search code examples
pythonbinary-data

Binary data to date/time stamp, unable to find the proper conversion using Python


First post, long time lurker, couldn't get the format the way I wanted it to be. -sorry

I'm trying to convert part of a binary file to a date/time (in Python). But whatever I try I'm unable to find the proper conversion.

My guess is that the left byte (0x30) is not part of the data, and the remaining 8 bytes contain the relevant data.

Below are the binary parts, both in decimal and in Hex, and the date/time they represent. Any help is highly appreciated.

48  101 26  235 227 242 150 197 65
30  65  1a  eb  e3  f2  96  c5  41  
  -- should read as 16 December 2023 at 15:03

48  198 54  133 112 138 151 197 65
30  c6  36  85  70  8a  97  c5  41 
  -- should read as 17 December 2023 at 12:37

48  74  38  27  107 41  116 196 65
30  4a  26  1b  6b  29  74  c4  41 
  -- should read as 1 October 2022 at 12:49

I've tried to unpack the data as either double or long long int and then obtain a date from it. I've searched the site and tried chat GPT to no avail.

Extra sample data

30  23  84  b1  a8  b5  97  c5  41 : 17 December 2023 at 18:45
30  3f  91  e7  96  b5  97  c5  41 : 17 December 2023 at 18:45 (slightly later)
30  a6  d6  2f  d1  b5  97  c5  41 : 17 December 2023 at 18:46
30  e8  16  9c  b9  b5  97  c5  41 : 17 December 2023 at 18:47

Solution

  • The reason I asked in comments for some more examples, especially ones close to each other by time, was to see what parts in the binary values were changing. I considered several types of encoding (some even based on textual representations of the timestamps). I looked at temporenc. I looked at floating point representations of seconds since the Epoch.

    But one thing struck me: it was quite interesting to see that among these three examples:

    {
        '30 65 1a eb e3 f2 96 c5 41': '16 December 2023 at 15:03',
        '30 c6 36 85 70 8a 97 c5 41': '17 December 2023 at 12:37',
        '30 23 84 b1 a8 b5 97 c5 41': '17 December 2023 at 18:45',
    }
    

    the c5 byte (2nd from right) is constant, while the 3rd byte from the right is 97 for Dec. 17 and 96 for Dec. 16.

    Further, I started looking at the whole integer value of the bytes in reverse order (excluding the first and last ones that are constant and may be delimiters).

    I then noticed that the time differences between two consecutive timestamps corresponded to a multiple of the int values. That multiple is close to 8_388_608, which is 2 ** 23.

    Fast-forward to a few more steps, and we get:

    def f(k):
        return (int(''.join(k.split()[1:-1][::-1]), 16) >> 23) - 4927272860
    

    That function gives a fairly good approximation of the timestamps provided, in seconds since the Epoch. One additional thing is, there was a conspicuous 3600 seconds error for the October date, so I figured there was some daylight savings in your dates. Since you are in Europe, I used Zurich's timezone.

    Put all together:

    import pandas as pd
    
    
    tz = 'Europe/Zurich'
    
    examples = {
        '30 65 1a eb e3 f2 96 c5 41': '16 December 2023 at 15:03',
        '30 c6 36 85 70 8a 97 c5 41': '17 December 2023 at 12:37',
        '30 4a 26 1b 6b 29 74 c4 41': '1 October 2022 at 12:49',
        '30 23 84 b1 a8 b5 97 c5 41': '17 December 2023 at 18:45',
        '30 3f 91 e7 96 b5 97 c5 41': '17 December 2023 at 18:45:30',
        '30 a6 d6 2f d1 b5 97 c5 41': '17 December 2023 at 18:46',
        '30 e8 16 9c b9 b5 97 c5 41': '17 December 2023 at 18:47',
    }
    
    examples = dict(sorted([
        (k, pd.Timestamp(v, tz=tz)) for k, v in examples.items()
    ], key=lambda item: item[1]))
    

    Then:

    def f(k):
        return (int(''.join(k.split()[1:-1][::-1]), 16) >> 23) - 4927272860
    
    def to_time(k, tz):
        return pd.Timestamp(f(k) * 1e9, tz=tz)
    
    fmt = '%F %T %Z'
    
    test = [
        (
            f'{v:{fmt}}',  # given time
            f'{to_time(k, tz=tz):{fmt}}', # estimate from bytes
            (to_time(k, tz=tz) - v).total_seconds(), # difference in seconds
        )
        for k, v in examples.items()
    ]
    
    >>> test
    [('2022-10-01 12:49:00 CEST', '2022-10-01 12:49:30 CEST', 30.0),
     ('2023-12-16 15:03:00 CET', '2023-12-16 15:03:23 CET', 23.0),
     ('2023-12-17 12:37:00 CET', '2023-12-17 12:36:37 CET', -23.0),
     ('2023-12-17 18:45:00 CET', '2023-12-17 18:45:25 CET', 25.0),
     ('2023-12-17 18:45:30 CET', '2023-12-17 18:44:49 CET', -41.0),
     ('2023-12-17 18:46:00 CET', '2023-12-17 18:46:46 CET', 46.0),
     ('2023-12-17 18:47:00 CET', '2023-12-17 18:45:59 CET', -61.0)]
    

    Perhaps with more examples and more info, you may adjust the constants used above. I tried to express the offset above in terms of an origin as a date, but it wasn't satisfying. One approach I tried was with:

    origin = pd.Timestamp('2018-01-05 18:48:33')
    offset = int(origin.value / 1e9)
    
    def f(k):
        return (int(''.join(k.split()[::-1])[3:-2], 16) >> 23) + offset
    

    but I didn't find it much better from an "Occam's razor" perspective.