Search code examples
pythonpandasdatetimedstpytz

Pandas not recognizing TZ aware datetime


I'm trying to write a script which takes in a date and returns a pandas dataframe containing zeroes with localized datetimes in half hour periods for the index. SP refers to settlement period which is half an hour. The scripts works really well on most days but when it changes to or from daylight savings time I get:

AmbiguousTimeError: Cannot infer dst time from '2017-10-29 01:00:00', try using the 'ambiguous' argument

This happens at the line:

df_datetime.at[datetime, "Generation"] = 0

The date it tries to convert is '2017-10-29 01:00:00' even though it is given '2017-10-29 01:00:00 + 1:00:00', would I have to convert to UTC=0, convert in pandas then localize again? the 29th October 2017 is the day that daylight savings time finished in the UK.

The full script is here below:

import pandas as pd
from datetime import datetime, timedelta
import pytz


def SP_to_time_delta(SP):
    dec_hour = (SP - 1)/2

    hour = int(dec_hour)
    if abs(int(dec_hour)-dec_hour) == 0.5:
        minute = 30
    else:
        minute = 0

    SP_timedelta = timedelta(hours=hour, minutes=minute)
    return SP_timedelta

def localize_datetime_UK(date, SP):
    pytz_tz = pytz.timezone('Europe/London')
    local_date = pytz_tz.localize(date)
    SP_timedelta = SP_to_time_delta(SP)

    local_datetime = local_date+SP_timedelta
    return local_datetime    

def get_datetime_df(SettlementDate, max_SP):
    df_datetime = pd.DataFrame([])
    for i in range(max_SP+1):
        datetime = localize_datetime_UK(SettlementDate, i)
        df_datetime.at[datetime, "Generation"] = 0
    df_datetime = df_datetime.sort_index()
    return df_datetime    

SettlementDate = datetime(2017, 10, 29) 
df_datetime = get_datetime_df(SettlementDate, 50)   

What's the best way to solve this?

Thanks for all replies and help!


Solution

  • Your problem originates due to a mixup in daylight saving time. Consider the following:

    import pandas as pd
    from datetime import datetime, timedelta
    import pytz
    
    pytz_tz = pytz.timezone('Europe/London')
    
    datetime_1 = pytz_tz.localize(datetime(2017, 10, 29) \
                 + timedelta(hours=0, minutes=0))
    datetime_2 = pytz_tz.localize(datetime(2017, 10, 29) \
                 + timedelta(hours=1, minutes=0))
    
    print(datetime_1)
    print(datetime_2)
    
    > 2017-10-29 00:00:00+01:00
    > 2017-10-29 01:00:00+00:00
    

    As you can see, you're handling two times the "same" point in time but with different encodings.

    One solution could be to convert all times to UTC (which has no dst) and only convert the times back when needed for output.

    However, I think that you are adding the timedelta at the wrong place. Consider

    datetime_1 = pytz_tz.localize(datetime(2017, 10, 29)) \  # localize first
                 + timedelta(hours=0)  # then add the delta
    datetime_2 = pytz_tz.localize(datetime(2017, 10, 29)) \ # localize first
                 + timedelta(hours=1)  # then add the delta
    
    > 2017-10-29 00:00:00+01:00
    > 2017-10-29 01:00:00+01:00
    

    which yields unique results. Check the semantics of your application which version applies.

    ===== old answer ======

    You have to make the datetime object timezone aware using

    from datetime import datetime, timedelta
    from pytz import timezone
    import pytz
    
    eastern = timezone('US/Eastern')
    SelltementDate = eastern.localize(datetime(2017, 10, 29, 0, 0, 0))
    

    Refer also to the documentation of pytz.