Search code examples
pythontimezonepandaspytz

How to set the offset of timestamps in a pandas dataframe?


I'm having a DataFrame with two columns. One column is filled with timestamps, the other column contains the offset in hours to UTC of the timestamp in the same row.

The DataFrame looks like this:

In [44]: df
Out[44]:
                     DATETIME  OFFSET

0   2013-01-01 00:00:00+00:00       1
1   2013-01-01 01:00:00+00:00       1
2   2013-01-01 02:00:00+00:00       1
3   2013-01-01 03:00:00+00:00       1
4   2013-01-01 04:00:00+00:00       1
5   2013-01-01 05:00:00+00:00       1
6   2013-01-01 06:00:00+00:00       2
7   2013-01-01 07:00:00+00:00       2
8   2013-01-01 08:00:00+00:00       2

What i like to achieve is to add the offset per row to the timestamp:

In [44]: df
Out[44]:
                     DATETIME  OFFSET

0   2013-01-01 00:00:00+01:00       1
1   2013-01-01 01:00:00+01:00       1
2   2013-01-01 02:00:00+01:00       1
3   2013-01-01 03:00:00+01:00       1
4   2013-01-01 04:00:00+01:00       1
5   2013-01-01 05:00:00+01:00       1
6   2013-01-01 06:00:00+02:00       2
7   2013-01-01 07:00:00+02:00       2
8   2013-01-01 08:00:00+02:00       2

I've tried with to replace tzinfo but failed to find a proper solution. I'm thinking about something like the following (pseudo code):

df.apply(lambda x: x['DATETIME'].replace(tzinfo=pytz.utc + x['OFFSET'])

Any help is appreciated.

Thanks, Thomas


Solution

  • It looks like pytz.FixedOffset fits this purpose.

    In [39]: df.apply(lambda x: pd.Timestamp(x['DATETIME'], tz=pytz.FixedOffset(60*x['OFFSET'])), axis=1)
    Out[39]: 
    0    2013-01-01 00:00:00+01:00
    1    2013-01-01 01:00:00+01:00
    2    2013-01-01 02:00:00+01:00
    3    2013-01-01 03:00:00+01:00
    4    2013-01-01 04:00:00+01:00
    5    2013-01-01 05:00:00+01:00
    6    2013-01-01 06:00:00+02:00
    7    2013-01-01 07:00:00+02:00
    8    2013-01-01 08:00:00+02:00
    dtype: object
    

    Others around here use time series more than I do, so this may not be best practice.