Search code examples
pythonpandasdataframemultiple-columnsquerying

counting all string values in given column of a table and grouping it based on third column


I have three columns. the table looks like this:

ID.   names     tag
1.     john.     1
2.     sam       0
3.    sam,robin. 1
4.     robin.    1

Id: type integer Names: type string Tag: type integer (just 0,1)

What I want is to find how many times each name is repeated grouped by 0 and 1. this is to be done in python.

Answer must look like

               0                 1
John           23                12
Robin          32                10
sam            9                 30

Solution

  • Using extractall and crosstab:

    s = df.names.str.extractall(r'(\w+)').reset_index(1, drop=True).join(df.tag)
    
    pd.crosstab(s[0], s['tag'])
    

    tag    0  1
    0
    john   0  1
    robin  0  2
    sam    1  1