Search code examples
pythonpandasdata-analysis

Pandas - Find increasing trend in a row of different columns


I have a dataset of milk production in India. I am trying to get the list of 5 states(if present) in which the total Milk Production increased in last 3 years through Pandas.

State        Total10-11 Total11-12  Total13-14  Total14-15  Total15-16
Andhra Pradesh    11204      12088       13007         9656      10817
Arunachal Pradesh    28         22          43           46         50
Assam               790        797         814          829        844
Bihar              6517       6643        7197         7775       8289
Chhattisgarh       1030       1118        1208         1232       1278
Goa                  60         60          68           66         54
Gujarat            9322       4089       11112        11690      12262
Haryana            6268       6661        7442         7902       8381
Himachal Pradesh   1102       1120        1151         1173       1283

Expected output:

State
Assam
Bihar
Chhattisgarh
Haryana
Himachal Pradesh

I want to find the states which have an increasing trend in their milk production each year. The Milk production should not have decreased in the consequent years compared to the previous years. the expected output states have the production in increasing order and there is no dip even once in their production. I am kind of stuck with this one, I tried a few methods but they are not close to the right answer. What's the solution ? Thanks in advance.


Solution

  • If you are looking for the difference is always increasing you can use diff > 0 and cumsum i.e

    df = df.set_index("State/UT Name")
    
    temp = (df.T.diff() > 0).cumsum()
    # Values will increment if the difference between past and present is positive 
    State/UT Name  Andhra Pradesh  Arunachal Pradesh  Assam  Bihar  Chhattisgarh  \
    Total10-11                  0                  0      0      0             0   
    Total11-12                  1                  0      1      1             1   
    Total13-14                  2                  1      2      2             2   
    Total14-15                  2                  2      3      3             3   
    Total15-16                  3                  3      4      4             4   
    
    State/UT Name  Goa  Gujarat  Haryana  Himachal Pradesh  
    Total10-11       0        0        0                 0  
    Total11-12       0        0        1                 1  
    Total13-14       1        1        2                 2  
    Total14-15       1        2        3                 3  
    Total15-16       1        3        4                 4  
    
    # The one with max sum is the one that kept increasing over time 
    temp.sum().nlargest(10)
    
    State/UT Name
    Assam                10
    Bihar                10
    Chhattisgarh         10
    Haryana              10
    Himachal Pradesh     10
    Andhra Pradesh        8
    Arunachal Pradesh     6
    Gujarat               6
    Goa                   3
    

    If you want the state names then

    states = temp.sum().nlargest(5).index.tolist()
    
    ['Assam', 'Bihar', 'Chhattisgarh', 'Haryana', 'Himachal_Pradesh']