Search code examples
pythonpandasdatedatetimeutc

How to convert dates to UTC taking into account winter and summer time?


My objective is to convert a list of dates to the correct UTC format, i.e.: taking into consideration winter and summer time. Dates were scrapped from a chat.

I am Switzerland based which uses (CET) UTC+01:00 during the winter and (CEST) Central European Summer Time during the summer i.e. UTC+02:00.

Here are the dates:

import pandas as pd

df = pd.DataFrame()
df['dates_raw'] = [
    '2022-01-20 01:12:15',
    '2022-06-22 12:00:00',
    '2022-10-29 05:57:02',
    '2022-12-18 09:34:17',
    '2023-01-12 06:36:10',
    '2023-02-17 20:23:10',
    '2023-04-12 02:02:24',
    '2023-09-12 15:57:35',]

And here the desired result:

df['dates_converted'] = [
    '2022-01-20 00:12:15',
    '2022-06-22 10:00:00',
    '2022-10-29 03:57:02',
    '2022-12-18 08:34:17',
    '2023-01-12 05:36:10',
    '2023-02-17 19:23:10',
    '2023-04-12 00:02:24',
    '2023-09-12 13:57:35',]

And finally, here are the dates changes from summer to winter to summer... etc. etc.:

dates_changes = {
    'st_2022' : '2022-03-27 02:00:00', # UTC + 2 (we gain one hour)
    'wt_2022' : '2022-10-30 03:00:00', # UTC + 1 (we loose one hour)
    'st_2023' : '2023-03-26 02:00:00', # UTC + 2 (we gain one hour)
    'wt_2023' : '2023-10-29 03:00:00', # UTC + 1 (we loose one hour)
    'st_2024' : '2024-03-31 02:00:00', # UTC + 2
    'wt_2024' : '2024-10-27 03:00:00', # UTC + 1
    }

As the dates changes look arbitrary, I don't know if there is any built-in function to make the conversion.

Many thanks in advance!


Solution

  • This can be done by first using pd.to_datetime to convert the strings to naive datetimes, then tz_localize to interpret those naive datetimes as aware datetimes in your desired timezone, and finally tz_convert to convert to UTC.

    df['dates_converted'] = pd.to_datetime(df['dates_raw']).dt.tz_localize('Europe/Zurich').dt.tz_convert('UTC')
    print(df['dates_converted'])
    

    result:

    0   2022-01-20 00:12:15+00:00
    1   2022-06-22 10:00:00+00:00
    2   2022-10-29 03:57:02+00:00
    3   2022-12-18 08:34:17+00:00
    4   2023-01-12 05:36:10+00:00
    5   2023-02-17 19:23:10+00:00
    6   2023-04-12 00:02:24+00:00
    7   2023-09-12 13:57:35+00:00
    Name: dates_converted, dtype: datetime64[ns, UTC]
    

    This code as it stands will give an exception with some ambiguous datetimes. For example, what is 2022-10-30 02:30:00? Is it 00:30 UTC or 01:30 UTC? - you can't tell. If your data contains such times, you can handle it by passing the ambiguous arg to tz_localize: check the Pandas documentation for the options here and choose the one that will best deal with your data.