I'm looking to convert a UNIX timestamp object to pandas date time. I'm importing the timestamps from a separate source, which displays a date time of 21-01-22 00:01
for the first timepoint and 21-01-22 00:15
for the second time point. Yet my conversion is 10 hours behind these two. Is this related to the +1000 at the end of each string?
df = pd.DataFrame({
'Time' : ['/Date(1642687260000+1000)/','/Date(1642688100000+1000)/'],
})
df['Time'] = df['Time'].str.split('+').str[0]
df['Time'] = df['Time'].str.split('(').str[1]
df['Time'] = pd.to_datetime(df['Time'], unit = 'ms')
Out:
Time
0 2022-01-20 14:01:00
1 2022-01-20 14:15:00
Other source:
Time
0 2022-01-21 00:01:00
1 2022-01-21 00:15:00
You could use a regex to extract Unix time and UTC offset, then parse Unix time to datetime and add the UTC offset as a timedelta, e.g.
import pandas as pd
df = pd.DataFrame({
'Time' : ['/Date(1642687260000+1000)/','/Date(1642688100000+1000)/', None],
})
df[['unix', 'offset']] = df['Time'].str.extract(r'(\d+)([+-]\d+)')
# datetime from unix first, leaves NaT for invalid values
df['datetime'] = pd.to_datetime(df['unix'], unit='ms')
# where datetime is not NaT, add the offset:
df.loc[~df['datetime'].isnull(), 'datetime'] += (
pd.to_datetime(df['offset'][~df['datetime'].isnull()], format='%z').apply(lambda t: t.utcoffset())
)
# or without the apply, but by using an underscored method:
# df['datetime'] = (pd.to_datetime(df['unix'], unit='ms') +
# pd.to_datetime(df['offset'], format='%z').dt.tz._offset)
df['datetime']
# 0 2022-01-21 00:01:00
# 1 2022-01-21 00:15:00
# 2 NaT
# Name: datetime, dtype: datetime64[ns]
Unfortunately, you'll have to use an underscored ("private") method, if you want to avoid the apply
. This also only works if you have a constant offset, i.e. if it's the same offset throughout the whole series.