If there is someone who understands, please help me to resolve this. I want to label user data using python pandas, where there are two columns in my dataset, namely author, and retweeted_screen_name. I want to do a label with the criteria if every user in the author column has the same value in the retweeted_screen_name column then are 1 and the others that do not have the same value are 0.
Author | RT_Screen_Name | Label |
---|---|---|
Alice | John | 1 |
Sandy | John | 1 |
Lisa | Mario | 0 |
Luna | Mark | 0 |
Luna | John | 1 |
Luke | Anthony | 0 |
IIUC, try with groupby
:
df["Label"] = (df.groupby("RT_Screen_Name")["Author"].transform("count")>1).astype(int)
>>> df
Author RT_Screen_Name Label
0 Alice John 1
1 Sandy John 1
2 Lisa Mario 0
3 Luna Mark 0
4 Luna John 1
5 Luke Anthony 0