I'd like to sum the values grouped by positive and negatives flows and then compare them to figure out the largest negative and largest positive flows.
I think itertools is probably the way to do this but can't figure it out.
#create a data frame that shows week and value
n_rows = 30
dftest = pd.DataFrame({'week': pd.date_range('1/4/2019', periods=n_rows, freq='W'),
'value': np.random.randint(-100,100,size=(n_rows))})
#flag positives and negatives
def flowFinder(row):
if row['value'] > 0:
return "Positive"
else:
return "Negative"
dftest['flag'] = dftest.apply(flowFinder,axis=1)
dftest
In this example df, you'd determine that 15-19 adds up toe 249 which is the max value of all the positive flows. The max negative flow is line 5 with -98.
Edit by Scott Boston It is best if you added code that generates your dataframe instead of links to a picture.
df = pd.DataFrame({'week':pd.date_range('2019-01-06',periods=21, freq='W'),
'value':[64,43,94,-19,3,-98,1,80,-7,-43,45,58,27,29,
-4,20,97,30,22,80,-95],
'flag':['Positive']*3+['Negative']+['Positive']+['Negative']+
['Positive']*2+['Negative']*2+['Positive']*4+
['Negative']+['Positive']*5+['Negative']})
You can try this:
df.groupby((df['flag'] != df['flag'].shift()).cumsum())['value'].sum().agg(['min','max'])
Output:
min -98
max 249
Name: value, dtype: int64
Using rename:
df.groupby((df['flag'] != df['flag'].shift()).cumsum())['value'].sum().agg(['min','max'])\
.rename(index={'min':'Negative','max':'Positive'})
Output:
Negative -98
Positive 249
Name: value, dtype: int64
Update answer comment:
df_out = df.groupby((df['flag'] != df['flag'].shift()).cumsum())['value','week']\
.agg({'value':'sum','week':'last'})
df_out.loc[df_out.agg({'value':['idxmin','idxmax']}).squeeze().tolist()]
Output:
value week
flag
4 -98 2019-02-10
9 249 2019-05-19