Search code examples
pythonpandasdataframegroup-bydate-difference

Calculating the range of column value datewise through Python


enter image description here

I want to calculate the maximum difference of product_mrp according to the dates. For that I was trying to group by date but not able to get after that.

INPUT:

+-------------+--------------------+
| product_mrp |     order_date     |
+-------------+--------------------+
|         142 |         01-12-2019 |
|          20 |         01-12-2019 |
|          20 |         01-12-2019 |
|         120 |         01-12-2019 |
|          30 |         03-12-2019 |
|          20 |         03-12-2019 |
|          45 |         03-12-2019 |
|         215 |         03-12-2019 |
|          15 |         03-12-2019 |
|          25 |         07-12-2019 |
|           5 |         07-12-2019 |
+-------------+--------------------+

EXPECTED OUTPUT:

 +-------------+--------------------+
| product_mrp |     order_date     |
+-------------+--------------------+
|         122 |         01-12-2019 |
|         200 |         03-12-2019 |
|          20 |         07-12-2019 |
+-------------+--------------------+

Solution

  • Use pandas to load the data, then use groupby to group by the shared index:

    import pandas as pd
    
    dates = ['01-12-2019']*4 + ['03-12-2019']*5 + ['07-12-2019']*2
    data = [142,20,20,120,30,20,45,215,15,25,5]
    
    df = pd.DataFrame(data,)
    df.index = pd.DatetimeIndex(dates)
    
    grouped = df.groupby(df.index).apply(lambda x: x.max()-x.min())
    

    Output:

                product mrp
    2019-01-12          122
    2019-03-12          200
    2019-07-12           20