I have a date in string format with zulu time zone. I tried to get rid of the "Z" character with regular expression, but I guess there is a more efficient way.
input:
|index | date | municipality
|------| --------------------|--------------
| 0 | 07.02.2021 1017Z | Algier
| 1 | 11.01.2019 1716Z | Abuja
| 2 | 23.02.2018 1002Z | Brüssel
| 3 | 19.07.2021 1459Z | Brüssel
| 4 | 26.11.2019 1049Z | Berlin
desired outcome:
|index | date | municipality
|------| --------------------|--------------
| 0 | 2021-02-17 | Algier
| 1 | 2019-01-11 | Abuja
| 2 | 2018-02-23 | Bruxelles
| 3 | 2021-07-19 | Bruxelles
| 4 | 2019-11-26 | Berlin
Instead of getting rid of the Z character, parse it correctly. EX:
import pandas as pd
df = pd.DataFrame({'date': ['07.02.2021 1017Z', '11.01.2019 1716Z']})
df['date'] = pd.to_datetime(df['date'], format='%d.%m.%Y %H%M%z')
# df['date']
# Out[19]:
# 0 2021-02-07 10:17:00+00:00
# 1 2019-01-11 17:16:00+00:00
# Name: date, dtype: datetime64[ns, UTC]
Note that setting the format
keyword is optional, but it helps for general reliability to specify it explicitly.
You can also floor the hours if you don't want them:
df['date'] = df['date'].dt.floor('D')
# df['date']
# Out[21]:
# 0 2021-02-07 00:00:00+00:00
# 1 2019-01-11 00:00:00+00:00
# Name: date, dtype: datetime64[ns, UTC]
...or format to string:
df['date'].dt.strftime('%Y-%m-%d')
# 0 2021-02-07
# 1 2019-01-11
# Name: date, dtype: object