I have a dataframe df:
tags
"a,b,c,d"
"c,q,k,t"
and a list of strings I need to search for:
searchList = ["a", "b"]
I need to add a new column to my dataframe named "topic". If a string from searchList appears in column "tags" I need to set the value in that row to bool True, otherwise bool False.
Endresult:
tags | topic
"a,b,c,d" | True
"c,q,k,t" | False
My code so far:
searchList = ["a", "b"]
pattern = '|'.join(searchfor)
df["topic"] = df.loc[(df["tags"].str.contains('|'.join(pattern), na=False)), True] = True
But I get the error:
KeyError: 'cannot use a single bool to index into setitem'
?
You can assign mask to new column, also change pattern
to searchList
:
searchList = ["a", "b"]
df["topic"] = df["tags"].str.contains('|'.join(searchList), na=False)
print (df)
tags topic
0 a,b,c,d True
1 c,q,k,t False
EDIT:
searchList = ["a", "b"]
df["topic"] = df["tags"].str.split(',', expand=True).isin(searchList).sum(axis=1).eq(2)
print (df)
tags topic
0 a,b,c,d True
1 c,q,k,t False
2 a,c,d False
Details:
First use Series.str.split
with expand=True
for new DataFrame
:
print (df["tags"].str.split(',', expand=True))
0 1 2 3
0 a b c d
1 c q k t
2 a c d None
Then compare by DataFrame.isin
for membership:
print (df["tags"].str.split(',', expand=True).isin(searchList))
0 1 2 3
0 True True False False
1 False False False False
2 True False False False
And count True
s values by sum
:
print (df["tags"].str.split(',', expand=True).isin(searchList).sum(axis=1))
0 2
1 0
2 1
dtype: int64
Last compare by Series.eq
, ==
for mask.