Search code examples
pythonpandasdataframepandas-groupbypandas-apply

pandas: how to perform group by and union together


I have the dataframe in following format:

     domain  c1  c2  c3  c4  c5  c6  c7  c8
      ---    --  --  --  --  --  --  --  --
0  facebook   0   1   1   0   0   0   1   0
1  facebook   1   0   0   0   0   0   1   1
2    google   1   0   0   1   0   1   0   0
3    google   0   1   0   0   1   0   0   1
4    google   0   0   0   1   1   0   0   1

Columns other than domain can only have a value of 0 or 1. I would like to perform a group by (on domain) and union (on rest of the columns) together, such that the output shows the union of values for each column in the group.

In the example data given above, I would like the output to be:

     domain  c1  c2  c3  c4  c5  c6  c7  c8
      ---    --  --  --  --  --  --  --  --
0  facebook   1   1   1   0   0   0   1   1
1    google   1   1   0   1   1   1   0   1 

The group by examples I have seen apply group by on one column, and then the aggregate functions (sum, mean, max etc.) on the other columns. I am unable to figure out how to apply the union on rest of the columns.

import pandas as pd
from io import StringIO

data = StringIO(u'''domain,c1,c2,c3,c4,c5,c6,c7,c8
facebook,0,1,1,0,0,0,1,0
facebook,1,0,0,0,0,0,1,1
google,1,0,0,1,0,1,0,0
google,0,1,0,0,1,0,0,1
google,0,0,0,1,1,0,0,1''')

df = pd.read_csv(data)

Solution

  • How about

    df.groupby('domain').agg(any).astype(int)
    

    This will give you

              c1  c2  c3  c4  c5  c6  c7  c8
    domain                                  
    facebook   1   1   1   0   0   0   1   1
    google     1   1   0   1   1   1   0   1