I have a dictionary of keys & values (massively truncated for ease of reading):
responsePolarities = {'yes':0.95, 'hell yes':0.99, 'no':-0.95, 'hell no':-0.99, 'okay':0.70}
I am doing a check to see if any key is in a string passed to my function:
for key, value in responsePolarities.items():
if key in string:
return value
Problem is that if, in the passed string, a word such as "know" is in it, the function sees the 'no' in 'know' and returns a -0.95.
I can't add spaces around the 'no' key because it could be the only response provided.
How can I make the function see 'no' as 'no' but not 'know'? Am I correct in thinking this is probably going to need to be a RegExp job, or is there something more simple I'm missing?
I thought about splitting my passed string into individual words, but then I couldn't check for multi-word phrases that modify the response polarity (like no vs. hell no)...
If I understand this correctly, you want to match text that contains your keys, but only if the whole word matches. You can do this using the regex word boundary delimiter \b
. It will match when the word is separated by punctuation, like :no,
but not other word characters like know
. Here you loop through some strings and for each find the matching keys in the dictionary:
responsePolarities = {'yes':0.95, 'hell yes':0.99, 'no':-0.95, 'hell no':-0.99, 'okay':0.70}
strings = [
'I know nothing',
'I now think the answer is no',
'hell, mayb yes',
'or hell yes',
'i thought:yes or maybe--hell yes--'
]
for s in strings:
for k,v in responsePolarities.items():
if re.search(rf"\b{k}\b", s):
print(f"'{s}' matches: {k} : {v}")
'I know nothing'
shouldn't match anything. The matches should look like:
'I now think the answer is no' matches: no : -0.95
'hell, mayb yes' matches: yes : 0.95
'or hell yes' matches: yes : 0.95
'or hell yes' matches: hell yes : 0.99
'i thought:yes or maybe--hell yes--' matches: yes : 0.95
'i thought:yes or maybe--hell yes--' matches: hell yes : 0.99
If you are doing a lot of searches, you might consider precompiling the regexes before the loop.