Search code examples
pythonpandasdatetimepytz

How to check that a datetime exists in a given timezone in Python?


I need to read as dataframe temperature data provided to me as CSV. The date column is supposed to be localized Europe/Paris timezone but the data provider gives data even when the datetime does not exist because of daylight saving. So if I run:

import pandas as pd
from io import StringIO
from pytz import timezone

csv_string = StringIO("date;temp\n\
2014-03-29 22:00:00;12,5\n\
2014-03-29 23:00:00;12,4\n\
2014-03-30 00:00:00;10,7\n\
2014-03-30 01:00:00;11,7\n\
2014-03-30 02:00:00;12,4\n\
2014-03-30 03:00:00;12,4\n\
2014-03-30 04:00:00;10,7\n\
2014-03-30 05:00:00;10,4\n\
2014-03-30 06:00:00;9,4")

df = pd.read_csv(csv_string, sep=";", decimal=",")
df["date"] = pd.to_datetime(df["date"], infer_datetime_format=True)
df.set_index("date", inplace=True)
df.index = df.index.tz_localize(timezone('Europe/Paris'))

I get:

pytz.exceptions.NonExistentTimeError: 2014-03-30 02:00:00

I am looking for a way to drop rows which will raise NonExistentTimeError. How can I test these rows using pandas, pytz or another lib?


Solution

  • I ended up with the following:

    import pandas as pd
    from io import StringIO
    from pytz import timezone
    
    csv_string = StringIO("date;temp\n\
    2014-03-29 22:00:00;12,5\n\
    2014-03-29 23:00:00;12,4\n\
    2014-03-30 00:00:00;10,7\n\
    2014-03-30 01:00:00;11,7\n\
    2014-03-30 02:00:00;12,4\n\
    2014-03-30 03:00:00;12,4\n\
    2014-03-30 04:00:00;10,7\n\
    2014-03-30 05:00:00;10,4\n\
    2014-03-30 06:00:00;9,4")
    
    def datetime_exists_in_tz(dt, tz):
        try:
            dt.tz_localize(tz)
            return True
        except:
            return False
    
    df = pd.read_csv(csv_string, sep=";", decimal=",")
    df["date"] = pd.to_datetime(df["date"], infer_datetime_format=True)
    df = df.loc[df["date"].apply(datetime_exists_in_tz, tz=timezone('Europe/Paris'))]
    df.set_index("date", inplace=True)
    df.index = df.index.tz_localize(timezone('Europe/Paris'))
    

    Still open to more elegant and more efficient solutions :)