Search code examples
pythonpandasdataframesimilarity

How to labeling data in pandas based on value of column have similar value in another column


If there is someone who understands, please help me to resolve this. I want to label user data using python pandas, where there are two columns in my dataset, namely author, and retweeted_screen_name. I want to do a label with the criteria if every user in the author column has the same value in the retweeted_screen_name column then are 1 and the others that do not have the same value are 0.

Author RT_Screen_Name Label
Alice John 1
Sandy John 1
Lisa Mario 0
Luna Mark 0
Luna John 1
Luke Anthony 0

Solution

  • IIUC, try with groupby:

    df["Label"] = (df.groupby("RT_Screen_Name")["Author"].transform("count")>1).astype(int)
    
    >>> df
      Author RT_Screen_Name  Label
    0  Alice           John      1
    1  Sandy           John      1
    2   Lisa          Mario      0
    3   Luna           Mark      0
    4   Luna           John      1
    5   Luke        Anthony      0