I have a column (user_id) with thousands of user_id (duplicated also), and another column (string) with several different strings. I would like to check if a unique user_id has a certain string and store the user id and 1 in a new dataframe. If the user_id does not have the string, then I would like to store the user_id and 0. In this case, I'd like to give a 1 to the string “good”, and a 0 to any other string.
I can't seem to find a good way of doing so without looping endlessly.
INITIAL PANDAS DATAFRAME
user_id | string |
---|---|
1 | good |
1 | better |
1 | good |
2 | good |
2 | worse |
3 | worse |
3 | even worse |
DESIRED PANDAS DATAFRAME
user_id | is string |
---|---|
1 | 1 |
2 | 1 |
3 | 0 |
To keep it vectorized, we can skip the apply
:
df['string'].eq('good').groupby(df['user_id']).any().astype(int).reset_index(name='is string')
user_id is string
0 1 1
1 2 1
2 3 0