Search code examples
pythonpandasnumpypython-itertools

Sum groups of flagged items and then find max values


I'd like to sum the values grouped by positive and negatives flows and then compare them to figure out the largest negative and largest positive flows.

I think itertools is probably the way to do this but can't figure it out.

#create a data frame that shows week and value
n_rows = 30
dftest = pd.DataFrame({'week': pd.date_range('1/4/2019', periods=n_rows, freq='W'),
                      'value': np.random.randint(-100,100,size=(n_rows))})

#flag positives and negatives
def flowFinder(row):
    if row['value'] > 0:
        return "Positive"
    else:
        return "Negative"
dftest['flag'] = dftest.apply(flowFinder,axis=1)
dftest

In this example df, you'd determine that 15-19 adds up toe 249 which is the max value of all the positive flows. The max negative flow is line 5 with -98.

Edit by Scott Boston It is best if you added code that generates your dataframe instead of links to a picture.

df = pd.DataFrame({'week':pd.date_range('2019-01-06',periods=21, freq='W'), 
                   'value':[64,43,94,-19,3,-98,1,80,-7,-43,45,58,27,29,
                            -4,20,97,30,22,80,-95],
                   'flag':['Positive']*3+['Negative']+['Positive']+['Negative']+
                           ['Positive']*2+['Negative']*2+['Positive']*4+
                           ['Negative']+['Positive']*5+['Negative']})

Solution

  • You can try this:

    df.groupby((df['flag'] != df['flag'].shift()).cumsum())['value'].sum().agg(['min','max'])
    

    Output:

    min    -98
    max    249
    Name: value, dtype: int64
    

    Using rename:

    df.groupby((df['flag'] != df['flag'].shift()).cumsum())['value'].sum().agg(['min','max'])\
      .rename(index={'min':'Negative','max':'Positive'})
    

    Output:

    Negative    -98
    Positive    249
    Name: value, dtype: int64
    

    Update answer comment:

    df_out = df.groupby((df['flag'] != df['flag'].shift()).cumsum())['value','week']\
               .agg({'value':'sum','week':'last'})
    df_out.loc[df_out.agg({'value':['idxmin','idxmax']}).squeeze().tolist()]
    

    Output:

          value       week
    flag                  
    4       -98 2019-02-10
    9       249 2019-05-19