I have the dataframe in following format:
domain c1 c2 c3 c4 c5 c6 c7 c8
--- -- -- -- -- -- -- -- --
0 facebook 0 1 1 0 0 0 1 0
1 facebook 1 0 0 0 0 0 1 1
2 google 1 0 0 1 0 1 0 0
3 google 0 1 0 0 1 0 0 1
4 google 0 0 0 1 1 0 0 1
Columns other than domain
can only have a value of 0 or 1.
I would like to perform a group by (on domain) and union (on rest of the columns) together, such that the output shows the union of values for each column in the group.
In the example data given above, I would like the output to be:
domain c1 c2 c3 c4 c5 c6 c7 c8
--- -- -- -- -- -- -- -- --
0 facebook 1 1 1 0 0 0 1 1
1 google 1 1 0 1 1 1 0 1
The group by examples I have seen apply group by on one column, and then the aggregate functions (sum, mean, max etc.) on the other columns. I am unable to figure out how to apply the union on rest of the columns.
import pandas as pd
from io import StringIO
data = StringIO(u'''domain,c1,c2,c3,c4,c5,c6,c7,c8
facebook,0,1,1,0,0,0,1,0
facebook,1,0,0,0,0,0,1,1
google,1,0,0,1,0,1,0,0
google,0,1,0,0,1,0,0,1
google,0,0,0,1,1,0,0,1''')
df = pd.read_csv(data)
How about
df.groupby('domain').agg(any).astype(int)
This will give you
c1 c2 c3 c4 c5 c6 c7 c8
domain
facebook 1 1 1 0 0 0 1 1
google 1 1 0 1 1 1 0 1