I have a dataset of milk production in India. I am trying to get the list of 5 states(if present) in which the total Milk Production increased in last 3 years through Pandas.
State Total10-11 Total11-12 Total13-14 Total14-15 Total15-16
Andhra Pradesh 11204 12088 13007 9656 10817
Arunachal Pradesh 28 22 43 46 50
Assam 790 797 814 829 844
Bihar 6517 6643 7197 7775 8289
Chhattisgarh 1030 1118 1208 1232 1278
Goa 60 60 68 66 54
Gujarat 9322 4089 11112 11690 12262
Haryana 6268 6661 7442 7902 8381
Himachal Pradesh 1102 1120 1151 1173 1283
Expected output:
State
Assam
Bihar
Chhattisgarh
Haryana
Himachal Pradesh
I want to find the states which have an increasing trend in their milk production each year. The Milk production should not have decreased in the consequent years compared to the previous years. the expected output states have the production in increasing order and there is no dip even once in their production. I am kind of stuck with this one, I tried a few methods but they are not close to the right answer. What's the solution ? Thanks in advance.
If you are looking for the difference is always increasing you can use diff > 0
and cumsum
i.e
df = df.set_index("State/UT Name")
temp = (df.T.diff() > 0).cumsum()
# Values will increment if the difference between past and present is positive
State/UT Name Andhra Pradesh Arunachal Pradesh Assam Bihar Chhattisgarh \
Total10-11 0 0 0 0 0
Total11-12 1 0 1 1 1
Total13-14 2 1 2 2 2
Total14-15 2 2 3 3 3
Total15-16 3 3 4 4 4
State/UT Name Goa Gujarat Haryana Himachal Pradesh
Total10-11 0 0 0 0
Total11-12 0 0 1 1
Total13-14 1 1 2 2
Total14-15 1 2 3 3
Total15-16 1 3 4 4
# The one with max sum is the one that kept increasing over time
temp.sum().nlargest(10)
State/UT Name
Assam 10
Bihar 10
Chhattisgarh 10
Haryana 10
Himachal Pradesh 10
Andhra Pradesh 8
Arunachal Pradesh 6
Gujarat 6
Goa 3
If you want the state names then
states = temp.sum().nlargest(5).index.tolist()
['Assam', 'Bihar', 'Chhattisgarh', 'Haryana', 'Himachal_Pradesh']