Search code examples
pythonpandaspython-re

How to check if a value in the list exists in the dataframe?


Got a data frame that has 5 columns and a list that contains 20 values.

if the value in the list exactly matches any value in the columns then it has to append the value (list value) to an empty column.

list=["siper","glock","tip",............]

INPUT (DATAFRAME ) DF1:

Dataframe

DESIRED OUTPUT:

Dataframe

My code to check if the value in the list exist in data frame.

list=["siper","glock","tip",............]
df2=[]
for i in list:
  mask=np.column_stack([df[col]==i for col in df])
  df2.append(df.loc[mask.any(axis=1)])

The above code gives a list of all rows in the data frame if the value in the list matches any column but I am not sure how to append values in the list to column1 if there is any match. Also, I want to add "Unknow" to column1 if there is no match.


Solution

  • Try str.extract:

    lst = ['glock', 'siper']
    
    df['D'] = df.apply(lambda x: x.str.extract(fr"\b({'|'.join(lst)})\b")
                                  .bfill().iloc[0].fillna('unknown'), axis=1)
    print(df)
    
    # Output
                      A                B                       C        D
    0            lfkdjs            siper              ldjkslkdjq    siper
    1  the glock hammer     ldksqjflsdkj            dljkfdslkfjs    glock
    2     lfdkslkdfjsdl    dflskjfsdlkjf                  tipper  unknown
    3     fdlsjkfsldkjf  dlfjksdflkdsjfs  The glockmaster hammer  unknown