Search code examples
pythonpandaspytz

How to Convert EST to Local Time with Dataframe


I'm trying to convert timestamps in EST to various localized timestamps in a pandas dataframe. I have a dataframe with timestamps in EST and a timezone into which they need to be converted.

I know that there are several threads already on topics like this. However, they either start in UTC or I can't replicate with my data.

Before writing, I consulted: How to convert GMT time to EST time using python

I imported the data:

import pandas
import datetime as dt
import pytz 

transaction_timestamp_est         local_timezone

2013-05-28 05:18:00+00:00         America/Chicago
2013-06-12 05:23:20+00:00         America/Los_Angeles
2014-06-21 05:26:26+00:00         America/New_York

I converted to datetime and created the following function:

df.transaction_timestamp_est = 
pd.to_datetime(df.transaction_timestamp_est)


def db_time_to_local(row):

    db_tz = pytz.timezone('America/New_York')
    local_tz = pytz.timezone(row['local_timezone'])

    db_date = db_tz.localize(row['transaction_timestamp_est'])
    local_date = db_date.astimezone(local_tz)

    return local_date

I run it here:

df['local_timestamp'] = df.apply(db_time_to_local, axis=1)

And get this error:

ValueError: ('Not naive datetime (tzinfo is already set)', 'occurred at index 0')

I expect a new column in the dataframe called 'local_timestamp' that has the timestamp adjusted according to the data in the local_timezone column.

Any help is appreciated!


Solution

  • The error you see looks like its because you are trying to localize a tz-aware timestamp. The '+00:00' in your timestamps indicates these are tz-aware, in UTC (or something like it).

    Some terminology: a naive date/time has no concept of timezone, a tz-aware (or localised) one is associated with a particular timezone. Localizing refers to converting a tz-naive date/time to a tz-aware one. By definition you can't localize a tz-aware date/time: you either either convert it to naive and then localize, or convert directly to the target timezone.

    To get that column into EST, convert to naive and then localize to EST:

    In [98]: df['transaction_timestamp_est'] = df['transaction_timestamp_est'].dt.tz_localize(None).dt.tz_localize('EST') 
    In [99]: df
    Out [99]:
    
    0   2013-05-28 05:18:00-05:00
    1   2013-06-12 05:23:20-05:00
    2   2014-06-21 05:26:26-05:00
    Name: transaction_timestamp_est, dtype: datetime64[ns, EST]
    

    Note the 'EST' in the dtype. Then, you can convert each timestamp to its target timezone:

    In [100]: df['local_ts'] = df.apply(lambda x: x[0].tz_convert(x[1]), axis=1)                                        
    
    In [101]: df                                                                                                        
    Out[101]: 
      transaction_timestamp_est       local_timezone                   local_ts
    0 2013-05-28 05:18:00-05:00      America/Chicago  2013-05-28 05:18:00-05:00
    1 2013-06-12 05:23:20-05:00  America/Los_Angeles  2013-06-12 03:23:20-07:00
    2 2014-06-21 05:26:26-05:00     America/New_York  2014-06-21 06:26:26-04:00
    

    To explain: each element in the first column is of type pd.Timestamp. Its tz_convert() method changes its timezone, converting the date/time to the new zone.

    This produces a column of pd.Timestamps with a mixture of timezones, which is a pain to handle in pandas. Most (perhaps all) pandas functions that operate on columns of date/times require the whole column to have the same timezone.

    If you prefer, convert to tz-naive:

    In [102]: df['local_ts'] = df.apply(lambda x: x[0].tz_convert(x[1]).tz_convert(None), axis=1)                                                      
    
    In [103]: df                                                                                                                                       
    Out[103]: 
      transaction_timestamp_est       local_timezone            local_ts
    0 2013-05-28 05:18:00-05:00      America/Chicago 2013-05-28 10:18:00
    1 2013-06-12 05:23:20-05:00  America/Los_Angeles 2013-06-12 10:23:20
    2 2014-06-21 05:26:26-05:00     America/New_York 2014-06-21 10:26:26
    

    If your data allows, its better to try to keep columns of timestamps (or indices) in a single timezone. UTC is usually best as it doesnt have DST transitions or other issues that can result in missing / ambiguous times, as most other timezones do.