Search code examples
pythonnumpynlp

Using 'isin' in python for three filters


I have the following dataframe

# Import pandas library
import pandas as pd
import numpy as np

# initialize list elements
data = ['george',
        'instagram',
        'nick',
        'basketball',
        'tennis']
  
# Create the pandas DataFrame with column name is provided explicitly
df = pd.DataFrame(data, columns=['Unique Words'])
  
# print dataframe.
df

and I want to create a new column based on the following two lists that looks like this

key_words = ["football", "basketball", "tennis"]
usernames = ["instagram", "facebook", "snapchat"]

Label
-----
0
2
0
1
1

So the words that are in the list key_words take the label 1, in the list usernames take the label 2 and all the other the label 0.

Thank you so much for your time and help!


Solution

  • One way to do this is to create a label map by numbering all of the elements in the first list as 1, and the other as 2. Then you can use .map in pandas to map the values and fillna with 0.

    # Import pandas library
    import pandas as pd
    import numpy as np
    
    # initialize list elements
    data = ['george',
            'instagram',
            'nick',
            'basketball',
            'tennis']
      
    # Create the pandas DataFrame with column name is provided explicitly
    df = pd.DataFrame(data, columns=['Unique Words'])
      
    key_words = ["football", "basketball", "tennis"]
    usernames = ["instagram", "facebook", "snapchat"]
    
    
    label_map = {e: i+1 for i, l in enumerate([key_words,usernames]) for e in l}
    print(label_map)
    
    df['Label'] = df['Unique Words'].map(label_map).fillna(0).astype(int)
    
    print(df)
    

    Output

    {'football': 1, 'basketball': 1, 'tennis': 1, 'instagram': 2, 'facebook': 2, 'snapchat': 2}
    
      Unique Words  Label
    0       george      0
    1    instagram      2
    2         nick      0
    3   basketball      1
    4       tennis      1