I'm trying to search word-tokenized abstracts for custom stem words using python. The following code is almost what I want. That is, do any of the values in stem_words appears once or more in word_tokenized_abstract?
if(any(word in stem_words for word in word_tokenized_abstract)):
do stuff
where...
I based the above at one-liner to check if at least one item in list exists in another list?
My issue is that my stem_words are of different lengths. I've tried the following code (a modification of the above) which did not work for me. I've tried a few other modifications but they either don't work or cause a crash.
if(any(word in stem_words for word[0:len(word)] in word_tokenized_abstract)):
do stuff
That is, do any of the values word_tokenized_abstract begin with any of the values in stem_words
?
if it helps, my stem_words = ['pancrea', 'muscul', 'derma', 'ovar']
Thanks! I apologize if this question has been answered previously but I couldn't find it.
So you want to check if any string in a first list is contained in any of the strings of the second list.
I'd try this:
any(y.startswith(x) for y in word_tokenized_abstract for x in stem_words)
Explanation: for each stem x
in stem_words
check if any string in word_tokenized_abstract
starts with x
.
If you just want the stem to be a substring of the word then use:
any(x in y for y in word_tokenized_abstract for x in stem_words)