I have a dataframe that has a timezone-aware index, and when I try to insert a new row into this dataframe, it doesn't work, instead changing the type of the index to 'object'
(ie, it can't add in the row into the current type).
MRE below:
import pandas as pd
df = pd.DataFrame({"time": ["2021/06/06 12:00:00"], "col1": [2]})
df.index = pd.to_datetime(df['time'])
df = df.drop('time', axis=1)
df.index = df.index.tz_localize('UTC')
# Doesn't work, index is now 'object' as this is considered a string
# row = pd.Series(name='2021/06/05 12:00:00')
# df = df.append(row)
# Also doesn't work, as timezones differ
row = pd.Series(name=pd.Timestamp('2021/06/05 12:00:00'))
df = df.append(row)
print(df.index)
I understand I can do the following:
tz = df.index[0].tz
row = pd.Series(name=pd.Timestamp('2021/06/05 12:00:00', tz=tz))
However I'm sceptical of whether this would work when the units differ, or maybe even some other property of a pandas Timestamp
I don't know of, so would ideally like to completely copy over the Timestamp
configuration of the index to the new timestamp I'm inserting.
If anyone would happen to know how to insert a new row into this dataframe whilst keeping the index type intact that would be greatly appreciated.
Since DatetimeTZ
only supports a single timezone, the key is to make sure the new Timestamp
always matches your existing df.index.tz
.
When inserting a new Timestamp
:
tz=df.index.tz
if it's from the same time zoneastimezone(df.index.tz)
if it's from a different time zoneGiven your MRE df
:
df
# col1
# time
# 2021-06-06 12:00:00+00:00 2
df.index.dtype
# datetime64[ns, UTC]
If you know your new Timestamp
is in the same tz
, just set it in the constructor like you mentioned:
ts1 = pd.Timestamp('2021/06/05 12:00:00', tz=df.index.tz)
df.loc[ts1] = 4
# col1
# time
# 2021-06-06 12:00:00+00:00 2
# 2021-06-05 12:00:00+00:00 4
df.index.dtype
# datetime64[ns, UTC]
If your new Timestamp
is in a different tz
(which was your main concern), convert it using astimezone
(e.g., from Eastern US back to UTC):
ts2 = pd.Timestamp('2021/06/05 12:00:00', tz='US/Eastern').astimezone(df.index.tz)
df.loc[ts2] = 6
# col1
# time
# 2021-06-06 12:00:00+00:00 2
# 2021-06-05 12:00:00+00:00 4
# 2021-06-05 16:00:00+00:00 6
df.index.dtype
# datetime64[ns, UTC]
Note that I'm using loc
because it's faster and simpler, but append
still works:
ts3 = pd.Timestamp('2021/06/05 12:00:00', tz='US/Central').astimezone(df.index.tz)
row = pd.Series([8], name=ts3, index=['col1'])
df = df.append(row)
# col1
# time
# 2021-06-06 12:00:00+00:00 2
# 2021-06-05 12:00:00+00:00 4
# 2021-06-05 16:00:00+00:00 6
# 2021-06-05 17:00:00+00:00 8
df.index.dtype
# datetime64[ns, UTC]