I have a dataframe with a column Status which values are either 'OPEN' or 'CLOSED'. I would like to do a groupby()
on multiple columns and use the following rule for that Status column: if there is one or more 'OPEN' values in the group then the aggregate should return 'OPEN' else 'CLOSED'
I tried the following:
df_agg = df.groupby(['col1', 'col2', 'col3'], as_index=False)
.agg({'col4': 'sum', 'Status': lambda x: np.where(x == 'OPEN', 'OPEN', 'CLOSED')}).reset_index(drop=True)
but this returns lists like ['OPEN', 'CLOSED'], ['OPEN', 'CLOSED', 'CLOSED'] and so on. Is there a better way using the agg()
function to return a single value of 'OPEN' or 'CLOSED' rather than doing the groupby()
and then doing again something like below?
np.where(df_agg['Status'].str.contains('OPEN'), 'OPEN', 'CLOSED')
Yes, it is possible with any
test if at least one match OPEN
in lambda function:
#if need test substring OPEN
lambda x: 'OPEN' if x.str.contains('OPEN').any() else 'CLOSED'
#if need test string OPEN
lambda x: 'OPEN' if x.eq('OPEN').any() else 'CLOSED'
Better is test before groupby and because OPEN > CLOSED
alpha numerically aggregate max
:
#if need test substring OPEN
df_agg = (df.assign(Status = np.where(df['Status'].str.contains('OPEN'),
'OPEN', 'CLOSED'))
.groupby(['col1', 'col2', 'col3'], as_index=False)
.agg({'col4': 'sum', 'Status': 'max'}))
#if need test string OPEN
df_agg = (df.assign(Status = np.where(df['Status'].eq('OPEN'),
'OPEN', 'CLOSED'))
.groupby(['col1', 'col2', 'col3'], as_index=False)
.agg({'col4': 'sum', 'Status': 'max'}))
Or:
#if need test substring OPEN
df_agg = (df.assign(Status = df['Status'].str.contains('OPEN'))
.groupby(['col1', 'col2', 'col3'], as_index=False)
.agg({'col4': 'sum', 'Status': 'any'})
.assign(Status = lambda x: x['Status'].map({True:'OPEN',False:'CLOSED'}))
#if need test string OPEN
df_agg = (df.assign(Status = df['Status'].eq('OPEN'))
.groupby(['col1', 'col2', 'col3'], as_index=False)
.agg({'col4': 'sum', 'Status': 'any'})
.assign(Status = lambda x: x['Status'].map({True:'OPEN',False:'CLOSED'}))