Search code examples
pythonpandasdataframedatetimeindex

Pandas insert row into dataframe with timezone-aware index


I have a dataframe that has a timezone-aware index, and when I try to insert a new row into this dataframe, it doesn't work, instead changing the type of the index to 'object' (ie, it can't add in the row into the current type).

MRE below:

import pandas as pd

df = pd.DataFrame({"time": ["2021/06/06 12:00:00"], "col1": [2]})
df.index = pd.to_datetime(df['time'])
df = df.drop('time', axis=1)
df.index = df.index.tz_localize('UTC')

# Doesn't work, index is now 'object' as this is considered a string
# row = pd.Series(name='2021/06/05 12:00:00')
# df = df.append(row)


# Also doesn't work, as timezones differ
row = pd.Series(name=pd.Timestamp('2021/06/05 12:00:00'))
df = df.append(row)

print(df.index)

I understand I can do the following:

tz = df.index[0].tz
row = pd.Series(name=pd.Timestamp('2021/06/05 12:00:00', tz=tz))

However I'm sceptical of whether this would work when the units differ, or maybe even some other property of a pandas Timestamp I don't know of, so would ideally like to completely copy over the Timestamp configuration of the index to the new timestamp I'm inserting.

If anyone would happen to know how to insert a new row into this dataframe whilst keeping the index type intact that would be greatly appreciated.


Solution

  • Since DatetimeTZ only supports a single timezone, the key is to make sure the new Timestamp always matches your existing df.index.tz.

    When inserting a new Timestamp:

    • Use tz=df.index.tz if it's from the same time zone
    • Use astimezone(df.index.tz) if it's from a different time zone

    Examples

    • Given your MRE df:

      df
      #                            col1
      # time                           
      # 2021-06-06 12:00:00+00:00     2
      
      df.index.dtype
      # datetime64[ns, UTC]
      
    • If you know your new Timestamp is in the same tz, just set it in the constructor like you mentioned:

      ts1 = pd.Timestamp('2021/06/05 12:00:00', tz=df.index.tz)
      df.loc[ts1] = 4
      #                            col1
      # time                           
      # 2021-06-06 12:00:00+00:00     2
      # 2021-06-05 12:00:00+00:00     4
      
      df.index.dtype
      # datetime64[ns, UTC]
      
    • If your new Timestamp is in a different tz (which was your main concern), convert it using astimezone (e.g., from Eastern US back to UTC):

      ts2 = pd.Timestamp('2021/06/05 12:00:00', tz='US/Eastern').astimezone(df.index.tz)
      df.loc[ts2] = 6
      #                            col1
      # time                           
      # 2021-06-06 12:00:00+00:00     2
      # 2021-06-05 12:00:00+00:00     4
      # 2021-06-05 16:00:00+00:00     6
      
      df.index.dtype
      # datetime64[ns, UTC]
      
    • Note that I'm using loc because it's faster and simpler, but append still works:

      ts3 = pd.Timestamp('2021/06/05 12:00:00', tz='US/Central').astimezone(df.index.tz)
      row = pd.Series([8], name=ts3, index=['col1'])
      df = df.append(row)
      #                            col1
      # time                           
      # 2021-06-06 12:00:00+00:00     2
      # 2021-06-05 12:00:00+00:00     4
      # 2021-06-05 16:00:00+00:00     6
      # 2021-06-05 17:00:00+00:00     8
      
      df.index.dtype
      # datetime64[ns, UTC]