I have data like this, it's output of a groupby:
numUsers = df.groupby(["user","isvalid"]).count()
count
user isvalid
5 0.0 1336
1.0 387
But I need to have count of count_valid and count_invalid columns for each user, like this:
count_valid count_invalid
user
5 387 1336
How can I do it in optimized way in Pandas?
You can use:
out = (df.groupby(["user","isvalid"]).count()
.rename({0: 'count_invalid', 1: 'count_valid'}, level=1)
['count'].unstack()
)
Output:
isvalid count_invalid count_valid
user
5 1336 387
Or, more generic if you have multiple columns, using a MultiIndex:
out = (df.groupby(["user","isvalid"]).count()
.unstack().rename(columns={0: 'invalid', 1: 'valid'}, level=1)
)
out.columns = out.columns.map('_'.join)
Output:
count_invalid count_valid
user
5 1336 387
Or from the original dataset with a crosstab
:
pd.crosstab(df['user'], df['isvalid'].map({0: 'count_invalid', 1: 'count_valid'}))