The data sample is:
a=pd.DataFrame({'Strings':['i xxx iwantto iii i xxx i',
'and you xxx and x you xxxxxx and you and you']})
b=['i','and you']
There are two words (phases) in b. I want to find them in a. I want to find the exact words, instead of substrings. So, I want the result to be:
['i' ,'i' ,'i']
['and you',' and you ',' and you']
I need to count how many times these words occur in a string. So I do not really need the above lists. I put it here because I want to show I want to find the exact words in the strings. Here is my try:
s='r\'^'+b[0]+' | '+b[0]+' | '+b[0]+'$\''
len(re.findall(s,a.loc[0,'Strings']))
I hope s
can find the words in the beginning, in the middle and at the end. I have a big a
and b
. So I cannot just use the real string in here. But the result is:
len(re.findall(s,a.loc[0,'Strings']))
Out[110]: 1
re.findall(s,a.loc[0,'Strings'])
Out[111]: [' i ']
Looks like only the middle one is matched and found. I am not sure where I went wrong.
a=pd.DataFrame({'Strings':['i xxx iwantto iii i xxx i',
'and you xxx and x you xxxxxx and you and you']})
print(a.Strings.str.findall('i |and you'))
Output
0 [i , i , i ]
1 [and you, and you, and you]
Name: Strings, dtype: object
print(a.Strings.str.findall('{} |{}'.format(*b)))