Search code examples
pandaslistsubstringassign

Assign value to new column if substring from a list appears in string


I have a dataframe df:

tags
"a,b,c,d"
"c,q,k,t"

and a list of strings I need to search for:

searchList =  ["a", "b"]

I need to add a new column to my dataframe named "topic". If a string from searchList appears in column "tags" I need to set the value in that row to bool True, otherwise bool False.

Endresult:

tags      | topic
"a,b,c,d" | True
"c,q,k,t" | False

My code so far:

searchList =  ["a", "b"]
pattern = '|'.join(searchfor)
df["topic"] = df.loc[(df["tags"].str.contains('|'.join(pattern), na=False)), True] = True

But I get the error:

KeyError: 'cannot use a single bool to index into setitem'

?


Solution

  • You can assign mask to new column, also change pattern to searchList:

    searchList =  ["a", "b"]
    df["topic"] = df["tags"].str.contains('|'.join(searchList), na=False)
    print (df)
          tags  topic
    0  a,b,c,d   True
    1  c,q,k,t  False
    

    EDIT:

    searchList =  ["a", "b"]
    df["topic"] = df["tags"].str.split(',', expand=True).isin(searchList).sum(axis=1).eq(2)
    print (df)
          tags  topic
    0  a,b,c,d   True
    1  c,q,k,t  False
    2    a,c,d  False
    

    Details:

    First use Series.str.split with expand=True for new DataFrame:

    print (df["tags"].str.split(',', expand=True))
       0  1  2     3
    0  a  b  c     d
    1  c  q  k     t
    2  a  c  d  None
    

    Then compare by DataFrame.isin for membership:

    print (df["tags"].str.split(',', expand=True).isin(searchList))
           0      1      2      3
    0   True   True  False  False
    1  False  False  False  False
    2   True  False  False  False
    

    And count Trues values by sum:

    print (df["tags"].str.split(',', expand=True).isin(searchList).sum(axis=1))
    0    2
    1    0
    2    1
    dtype: int64
    

    Last compare by Series.eq, == for mask.