Below is the basic data I'm provided with every month. There are many department related files I get and the job gets very monotonous and repetitive.
Month,year,sales,
January,2017,34400,
February,2017,35530,
March,2017,34920,
April,2017,35950,
May,2017,36230,
June,2017,36820,
July,2017,34590,
August,2017,36500,
September,2017,36600,
October,2017,37140,
November,2017,36790,
December,2017,43500,
January,2018,34900,
February,2018,37700,
March,2018,37900,
April,2018,38100,
May,2018,37800,
June,2018,38500,
July,2018,39400,
August,2018,39700,
September,2018,39980,
October,2018,40600,
November,2018,39100,
December,2018,46600,
January,2019,42500,
I've tried to use certain functions like value_count(sadly, giving only summary) in order to achieve this output. And failed. (See output below.)
I need to autofill the 3rd and 4th columns (with fillna=True/False)
They only allow Apache OpenOffice for our job, and hence no Excel. But we have the permission by IT to install Python.
The solution in this Link is not helping me as they are grouping-by two columns. The columns in my output are inter-dependent.
import pandas as pd
df = pd.read_csv("Test_1.csv", "a")
df['comparative_position'] = df['sales'].diff().fillna=True
df.loc[df['comparative_position'] > 0.0, 'comparative_position'] = "Profit"
df.loc[df['comparative_position'] < 0.0, 'comparative_position'] = "Loss"
Month,Year,Sales,comparative_position,Months_in_P(or)L,Highest_in_12Months
January,2016,34400,NaN,NaN,NaN
February,2016,35530,Profit,1,NaN
March,2016,34920,Loss,1,NaN
April,2016,35950,Profit,1,NaN
May,2016,36230,Profit,2,NaN
June,2016,36820,Profit,3,NaN
July,2016,34590,Loss,1,NaN
August,2016,36500,Profit,1,NaN
September,2016,36600,Profit,2,NaN
October,2016,37140,Profit,3,NaN
November,2016,36790,Loss,1,NaN
December,2016,43500,Profit,1,43500
January,2017,34900,Loss,1,43500
February,2017,37700,Profit,1,43500
March,2017,37900,Profit,2,43500
April,2017,38100,Profit,3,43500
May,2017,37800,Loss,1,43500
June,2017,38500,Profit,1,43500
July,2017,39400,Profit,2,43500
August,2017,39700,Profit,3,43500
September,2017,39980,Profit,4,43500
October,2017,40600,Profit,5,43500
November,2017,39100,Loss,1,43500
December,2017,46600,Profit,1,46600
January,2018,42500,Loss,1,46600
AFAIU this should work for you:
# Get difference from previous as True / False
df['P/L'] = df.sales > df.sales.shift()
# Add column counting 'streaks' of P or L
df['streak'] = df['P/L'].groupby(df['P/L'].ne(df['P/L'].shift()).cumsum()).cumcount()
# map True/False to string of Profit/Loss
df['P/L'] = df['P/L'].map({True:'Profit', False:'Loss'})
# max of last n months where n is 12, as in your example, you can change it to any int
df['12_max'] = df.sales.rolling(12).max()
Output:
Month year sales P/L streak 12_max
0 January 2017 34400 False 0 NaN
1 February 2017 35530 True 0 NaN
2 March 2017 34920 False 0 NaN
3 April 2017 35950 True 0 NaN
4 May 2017 36230 True 1 NaN
5 June 2017 36820 True 2 NaN
6 July 2017 34590 False 0 NaN
7 August 2017 36500 True 0 NaN
8 September 2017 36600 True 1 NaN
9 October 2017 37140 True 2 NaN
10 November 2017 36790 False 0 NaN
11 December 2017 43500 True 0 43500.0
12 January 2018 34900 False 0 43500.0
13 February 2018 37700 True 0 43500.0
14 March 2018 37900 True 1 43500.0
15 April 2018 38100 True 2 43500.0
16 May 2018 37800 False 0 43500.0
17 June 2018 38500 True 0 43500.0
18 July 2018 39400 True 1 43500.0
19 August 2018 39700 True 2 43500.0
20 September 2018 39980 True 3 43500.0
21 October 2018 40600 True 4 43500.0
22 November 2018 39100 False 0 43500.0
23 December 2018 46600 True 0 46600.0
24 January 2019 42500 False 0 46600.0