Search code examples
pythonpandasmergetimestampconcatenation

Python combine monthly and minutes dataframes with TZ-aware datetime index


I have two time-series below. Datetime indices are TZ-aware.

df1: Five minutes interval

    value_1
Timestamp                                   
2009-04-01 10:50:00+09:30   50
2009-04-05 11:55:00+09:30   55
2009-04-23 16:00:00+09:30   0
2009-05-03 10:50:00+09:30   50
2009-05-07 11:55:00+09:30   55
2009-05-11 16:00:00+09:30   0
2009-07-04 02:05:00+09:30   5
2009-07-21 09:10:00+09:30   10
2009-07-30 12:15:00+09:30   15
2010-09-02 11:25:00+09:30   25
2010-09-22 15:30:00+09:30   30
2010-09-30 06:15:00+09:30   15
2010-12-06 11:25:00+09:30   25
2010-12-22 15:30:00+09:30   30
2010-12-28 06:15:00+09:30   15

df2: Monthly interval obtained by groupby('Month') from a different dataset.

    value_2
Timestamp               
2009-04-30 00:00:00+09:30   23
2009-07-31 00:00:00+09:30   28
2010-12-31 00:00:00+09:30   23

I want to combine the two datasets by index. Any record in df1 should be included in the final results if it has the same month as df2. The expected result is below.

    value_1 value_2
Timestamp                                   
2009-04-01 10:50:00+09:30   50  23
2009-04-05 11:55:00+09:30   55  23
2009-04-23 16:00:00+09:30   0   23
2009-07-04 02:05:00+09:30   5   28
2009-07-21 09:10:00+09:30   10  28
2009-07-30 12:15:00+09:30   15  28
2010-12-06 11:25:00+09:30   25  23
2010-12-22 15:30:00+09:30   30  23
2010-12-28 06:15:00+09:30   15  23

This is my attempt.

result = pd.concat([df1, df2], axis=1) 
# this combines the datasets, but not like expected, also by including join="outer". With join="inner", no data shown.

result = pd.merge(df1, df2, left_on='value_1', right_index=True)
# this return ValueError: You are trying to merge on Int64 and datetime64[ns, Australia/North] columns. If you wish to proceed you should use pd.concat

# Using @Ben.T
mt_hMF = df1.merge( df2.reset_index().set_index(df2.index.floor('M')),
                         how='left', left_index=True, right_index=True).set_index('Timestamp')
# This gives ValueError: <MonthEnd> is a non-fixed frequency

Solution

  • Try this, using strftime to create a temporary merge key for both dataframes:

    df1.reset_index()\
       .assign(yearmonth=df1.index.strftime('%Y%m'))\
       .merge(df2.assign(yearmonth=df2.index.strftime('%Y%m')))\
       .set_index('Timestamp')\
       .drop('yearmonth', axis=1)
    

    Output:

        value_1  value_2
    Timestamp                                  
    2009-04-01 10:50:00+09:30       50       23
    2009-04-05 11:55:00+09:30       55       23
    2009-04-23 16:00:00+09:30        0       23
    2009-07-04 02:05:00+09:30        5       28
    2009-07-21 09:10:00+09:30       10       28
    2009-07-30 12:15:00+09:30       15       28
    2010-12-06 11:25:00+09:30       25       23
    2010-12-22 15:30:00+09:30       30       23
    2010-12-28 06:15:00+09:30       15       23