Search code examples
pythonpython-3.xpandasdataframegroup-by

Count if a column has a specific value in another column


I have a column (user_id) with thousands of user_id (duplicated also), and another column (string) with several different strings. I would like to check if a unique user_id has a certain string and store the user id and 1 in a new dataframe. If the user_id does not have the string, then I would like to store the user_id and 0. In this case, I'd like to give a 1 to the string “good”, and a 0 to any other string.

I can't seem to find a good way of doing so without looping endlessly.

INITIAL PANDAS DATAFRAME

user_id string
1 good
1 better
1 good
2 good
2 worse
3 worse
3 even worse

DESIRED PANDAS DATAFRAME

user_id is string
1 1
2 1
3 0

Solution

  • To keep it vectorized, we can skip the apply:

    df['string'].eq('good').groupby(df['user_id']).any().astype(int).reset_index(name='is string')
    
       user_id  is string
    0        1          1
    1        2          1
    2        3          0