Search code examples
pythondataframeappendcontains

Finding keywords in a column and adding those keywords in a new column against the same row


i am new to python and this is my first post on stack overflow. I have a list of keywords and a dataframe containing multiple columns.

I want to search for these keywords in a particular column and write the keyword that appears against it.

This is what I am doing. My code

This is the error I am getting. The loop with the error

This is what I want to get. Desired output

Please help figuring out what is going wrong or suggesting a better way to to this. Thanks! Writing the code below if it helps making things easier.

import pandas as pd

keywords = ["hello","hi","greetings","wassup"]

data = ["hello, my name is Harry", "Hi I am John", "Yo! Wassup", "Greetings fellow traveller","Hey im 
Henry", "Hello there General Kenobi"]

df = pd.DataFrame(data,columns = ['strings'])

df['Keywords'] = ""

df2 = pd.DataFrame(data = None, columns = df.columns)

for word in keywords:
     temp = df[df['strings'].str.contains(word,na = False)]
     temp.reset_index(drop = True)
     temp['Keywords'] = word
     df2.append(temp)

Error:

C:\Users\harka\Anaconda3\lib\site-packages\ipykernel_launcher.py:5: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy """


Solution

  • I added 'Yo' to show that it can return multiple strings

    import pandas as pd
    
    def keyword(row):
      strings = row['strings']
      keywords = ["hello","hi","greetings","wassup",'yo']
      keyword = [key for key in keywords if key.upper() in strings.upper()]
      return keyword
    
    data = ["hello, my name is Harry", "Hi I am John", "Yo! Wassup", "Greetings fellow traveller","Hey im Henry", "Hello there General Kenobi"]
    
    df = pd.DataFrame(data,columns = ['strings'])
    df['keyword'] = df.apply(keyword, axis=1)
    

    if you don't like the list of strings return then perhaps a comma separated string?

    import pandas as pd
    
    def keyword(row):
      strings = row['strings']
      keywords = ["hello","hi","greetings","wassup",'yo']
      keyword = [key for key in keywords if key.upper() in strings.upper()]
      return ','.join(keyword)
    
    data = ["hello, my name is Harry", "Hi I am John", "Yo! Wassup", "Greetings fellow traveller","Hey im Henry", "Hello there General Kenobi"]
    
    df = pd.DataFrame(data,columns = ['strings'])
    df['keyword'] = df.apply(keyword, axis=1)