Search code examples
pythonpandasdataframedatetimeoperation

compare two dataframe according to the year and month


I have two dataframe:

dfr1

                    qq
date                  
1956-01-01  685.519348
1956-01-02  731.868500
1956-01-03  510.579375
1956-01-04  412.347250
1956-01-05  358.297625
2010-12-27  3.992000
2010-12-28  1.099583
2010-12-29  104.428958
2010-12-30  4.932750
2010-12-31  101.737292
2013-12-27  7.992000
2013-12-28  105.099583
2013-12-29  104.428958
2013-12-30  102.932750
2013-12-31  101.737292

and

dfr2

                    q_t
01-01  61.629342
01-02  61.409750
01-03  61.309208
01-04  61.161462
01-05  61.020508
12-27  69.065375
12-28  68.935908
12-29  68.603104
12-30  68.474458
12-31  68.209075

As you can notice, the days and month are the same in both but the first one has year while the second one not.

I would like to compare the first with the second. In particular, I would like to know when the qq value in the first one is less or equal to the value in the second according to the day and month of the second. This is thus what I expects:

1956-01-01  685.519348  61.629342   False
1956-01-02  731.8685    61.40975    False
1956-01-03  510.579375  61.309208   False
1956-01-04  412.34725   61.161462   False
1956-01-05  358.297625  61.020508   False
2010-12-27  3.992       69.065375   True
2010-12-28  1.099583    68.935908   True
2010-12-29  104.428958  68.603104   False
2010-12-30  4.93275     68.474458   True
2010-12-31  101.737292  68.209075   False
2013-12-27  7.992       69.065375   True
2013-12-28  105.099583  68.935908   False
2013-12-29  104.428958  68.603104   False
2013-12-30  102.93275   68.474458   False
2013-12-31  101.737292  68.209075   False

I tried compared and, as expected, I got an error:

dfr1.compare(dfr1)
*** ValueError: Can only compare identically-labeled (both index and columns) DataFrame objects

I tried also:

dfr_1['new'] = dfr_1 < dfr_2
*** ValueError: Can only compare identically-labeled (both index and columns) DataFrame objects

I see thus two problem:

  1. the two dataframes have different dimension
  2. the two dataframes have different indexes.

Specifically, I am not able to select properly the indexes properties.

What do you think?


Solution

  • # create some joining columns called mmdd, e.g. '12-31'
    dfr1['mmdd'] = dfr1['date'].dt.strftime('%m-%d')
    dfr2.index.name = 'mmdd'
    
    pd.merge(dfr1, dfr2, on='mmdd', how = 'left').drop('mmdd', axis=1).assign(lt= lambda x: x['qq'] < x['q_t'])
    

    Output:

             date          qq        q_t     lt
    0  1956-01-01  685.519348  61.629342  False
    1  1956-01-02  731.868500  61.409750  False
    2  1956-01-03  510.579375  61.309208  False
    3  1956-01-04  412.347250  61.161462  False
    4  1956-01-05  358.297625  61.020508  False
    5  2010-12-27    3.992000  69.065375   True
    6  2010-12-28    1.099583  68.935908   True
    7  2010-12-29  104.428958  68.603104  False
    8  2010-12-30    4.932750  68.474458   True
    9  2010-12-31  101.737292  68.209075  False
    10 2013-12-27    7.992000  69.065375   True
    11 2013-12-28  105.099583  68.935908  False
    12 2013-12-29  104.428958  68.603104  False
    13 2013-12-30  102.932750  68.474458  False
    14 2013-12-31  101.737292  68.209075  False