Search code examples
pythonpandasaggregate

Pandas merging value of two rows in columns of a single row


I have data like this, it's output of a groupby:

numUsers = df.groupby(["user","isvalid"]).count()

                      count     
user       isvalid               
5          0.0         1336  
           1.0          387 

But I need to have count of count_valid and count_invalid columns for each user, like this:

                    count_valid  count_invalid
user 
5                           387           1336
           

How can I do it in optimized way in Pandas?


Solution

  • You can use:

    out = (df.groupby(["user","isvalid"]).count()
             .rename({0: 'count_invalid', 1: 'count_valid'}, level=1)
             ['count'].unstack()
           )
    

    Output:

    isvalid  count_invalid  count_valid
    user                               
    5                 1336          387
    

    Or, more generic if you have multiple columns, using a MultiIndex:

    out = (df.groupby(["user","isvalid"]).count()
             .unstack().rename(columns={0: 'invalid', 1: 'valid'}, level=1)
           )
    out.columns = out.columns.map('_'.join)
    

    Output:

          count_invalid  count_valid
    user                            
    5              1336          387
    

    Or from the original dataset with a crosstab:

    pd.crosstab(df['user'], df['isvalid'].map({0: 'count_invalid', 1: 'count_valid'}))