I would like to convert a range of Datetime
's to UTC timezone. The following code takes more than three minutes for 500_000 entries.
How can I speed up this process?
import datetime
from pytz import timezone
import pytz
import pandas as pd
import time
abc = pd.date_range(start='2020-03-28 05:00:00', periods=500_000, freq='5min')
UTC = pytz.timezone('UTC')
BERLIN = pytz.timezone('Europe/Berlin')
print("abc[0]=\n", abc[0])
print("abc[-1]=\n", abc[-1])
myList = []
my_time = time.time()
for runner in abc:
localizedToBerlin = BERLIN.localize(runner)
localizedToBerlinAsUtc = localizedToBerlin.astimezone(UTC)
myList.append([runner, localizedToBerlinAsUtc])
print('runtime:', time.time() - my_time)
results in:
abc[0]=
2020-03-28 05:00:00
abc[-1]=
2024-12-28 07:35:00
runtime: 209.57262253761292
pandas built-in - if you work with/in pandas
, try to avoid loops and use the built-ins, e.g. tz_convert
. From Europe/Berlin to UTC:
import pandas as pd
dr = pd.date_range(start='2020-03-28 05:00:00', periods=500_000, freq='5min',
tz='Europe/Berlin')
%timeit dr.tz_convert('UTC')
77.2 µs ± 1.4 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Localization from naive to Europe/Berlin and then to UTC:
dr = pd.date_range(start='2020-03-28 05:00:00', periods=500_000, freq='5min')
%timeit dr.tz_localize('Europe/Berlin', nonexistent='NaT', ambiguous='NaT').tz_convert('UTC')
69.5 ms ± 191 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
UTC first - Also note that it is much faster to localize naive to UTC and then convert to another timezone - UTC localization involves no computation of DST changes etc.
dr = pd.date_range(start='2020-03-28 05:00:00', periods=500_000, freq='5min')
%timeit dr.tz_localize('UTC').tz_convert('Europe/Berlin')
173 µs ± 2.51 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Working with lists - if you're not working with pandas
data structures or similar and have to use lists, localization to UTC and then to another timezone still performs (relatively) ok:
import pytz
l = dr.to_list()
l_utc = list(map(pytz.utc.localize, l))
# %timeit list(map(pytz.utc.localize, l))
# 1.44 s ± 7.72 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
cet = pytz.timezone('Europe/Berlin') # CEST at the moment
l_cet = list(map(lambda t: t.astimezone(cet), l_utc))
# %timeit list(map(lambda t: t.astimezone(cet), l_utc))
# 3.24 s ± 10.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Going directly from naive to a certain timezone is still a pain with pytz
:
%timeit list(map(cet.localize, l))
2min 9s ± 7.31 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
dateutil vs. pytz - An alternative here would be to use dateutil
- since it uses the same time zone model as Python, you can use replace()
:
import dateutil
d_cet = dateutil.tz.gettz('Europe/Berlin')
%timeit [t.replace(tzinfo=d_cet) for t in l]
5.67 s ± 357 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)