Search code examples
pythonpandasdatetimedatetime-formatccxt

What datetime format is this and how do I parse it?


I have some data that I'm pulling from an API and the date is formatted like this: '1522454400000'

Not sure how to parse it but this is what I have (unsuccessfully tried)

df = DataFrame(test)
df.columns = ['Date', 'Open', 'High', 'Low', 'Close', 'Volume']
df.set_index('Date')
df.index = pd.to_datetime(df.index, unit = 'd')

where the variable test is a list of the underlying data. this incorrectly parses the data as year being 1970.

The result of the parse:

1970-01-01 00:00:00.000000000

Any ideas?

********************** EDIT ************************************

Python version: 3

Pandas version. 0.23.0

Here is a working example for reproducibility. But first, here are the facts I have discovered.

DATE FORMAT: 64-bit Unix Timestamp in milliseconds since Epoch 1 Jan 1970

TIMEZONE: UTC

MY TIMEZONE: UTC + 4 (the desired datetime index)

The code:

import bitmex
import pandas as pd
from pandas import DataFrame
import datetime
import ccxt

api_connector = ccxt.bitmex({
    'enableRateLimit': True
})

#get OHLCV Data
testdata = api_connector.fetch_ohlcv('XBTZ18', '1h')

df2 = DataFrame(testdata)
df2.columns = ['Date', 'Open', 'High', 'Low', 'Close', 'Volume']
#df2.set_index('Date')
df2.index = pd.to_datetime(df2.Date, unit='ms')
df3 = df2.drop(['Date'],
              axis =1)
df3.tail()

This returns:

Open    High    Low Close   Volume
Date                    
2018-07-06 00:00:00 6538.5  6555.0  6532.5  6537.0  176836
2018-07-06 01:00:00 6537.0  6535.5  6520.5  6524.5  139735
2018-07-06 02:00:00 6524.5  6542.5  6525.5  6542.5  59759
2018-07-06 03:00:00 6542.5  6545.0  6538.0  6538.0  121410
2018-07-06 04:00:00 6538.0  6538.5  6477.5  6525.0  764125

Close! but no cigar. Today's date is 8/31/2018 so I would at least expect it to be in the correct month.

What am I doing wrong, folks?


Solution

  • This is almost certainly a variation on "Unix time": instead of seconds since the 1 Jan 1970 epoch, it's milliseconds since the 1 Jan 1970 epoch:

    >>> datetime.datetime.utcfromtimestamp(int('1522454400000') / 1000)
    datetime.datetime(2018, 3, 31, 0, 0)
    

    That certainly looks like a reasonable date. And it even looks like it probably is UTC, not local time (unless you happen to be in England, or weren't expecting it to be exactly at midnight).


    I don't think any of Pandas' built-in formats (which are actually just wrappers around formats from datetime and/or dateutil) exactly matches this, so you'll probably need to either do what I did about (convert to int and treat it as a number) or do the stringy equivalent (chop off the last 3 characters and then treat as a string of a UNIX timestamp).

    The first one seems simpler:

    >>> pd.to_datetime(int('1522454400000'), unit='ms')
    Timestamp('2018-03-31 00:00:00')
    

    In fact, it'll even work directly on strings, doing the conversion implicitly:

    >>> pd.to_datetime('1522454400000', unit='ms')
    Timestamp('2018-03-31 00:00:00')