Search code examples
pythonpandasdatetimetimezonetimedelta

Convert column of unix objects to datetime - python


I'm looking to convert a UNIX timestamp object to pandas date time. I'm importing the timestamps from a separate source, which displays a date time of 21-01-22 00:01 for the first timepoint and 21-01-22 00:15 for the second time point. Yet my conversion is 10 hours behind these two. Is this related to the +1000 at the end of each string?

df = pd.DataFrame({      
    'Time' : ['/Date(1642687260000+1000)/','/Date(1642688100000+1000)/'],
   })

df['Time'] = df['Time'].str.split('+').str[0]
df['Time'] = df['Time'].str.split('(').str[1]
df['Time'] = pd.to_datetime(df['Time'], unit = 'ms')

Out:

                 Time
0 2022-01-20 14:01:00
1 2022-01-20 14:15:00

Other source:

                 Time
0 2022-01-21 00:01:00
1 2022-01-21 00:15:00

Solution

  • You could use a regex to extract Unix time and UTC offset, then parse Unix time to datetime and add the UTC offset as a timedelta, e.g.

    import pandas as pd
    
    df = pd.DataFrame({      
        'Time' : ['/Date(1642687260000+1000)/','/Date(1642688100000+1000)/', None],
       })
    
    df[['unix', 'offset']] = df['Time'].str.extract(r'(\d+)([+-]\d+)')
    
    # datetime from unix first, leaves NaT for invalid values
    df['datetime'] = pd.to_datetime(df['unix'], unit='ms')
    # where datetime is not NaT, add the offset:
    df.loc[~df['datetime'].isnull(), 'datetime'] += (
                      pd.to_datetime(df['offset'][~df['datetime'].isnull()], format='%z').apply(lambda t: t.utcoffset())
              )
    
    # or without the apply, but by using an underscored method:
    # df['datetime'] = (pd.to_datetime(df['unix'], unit='ms') + 
    #                   pd.to_datetime(df['offset'], format='%z').dt.tz._offset)
    
    df['datetime']
    # 0   2022-01-21 00:01:00
    # 1   2022-01-21 00:15:00
    # 2                   NaT
    # Name: datetime, dtype: datetime64[ns]
    

    Unfortunately, you'll have to use an underscored ("private") method, if you want to avoid the apply. This also only works if you have a constant offset, i.e. if it's the same offset throughout the whole series.