I am trying to know a position of a string (word) in a sentence. I am using the function below. This function is working perfectly for most of the words but for this string GLC-SX-MM=
in the sentence I have a lot of GLC-SX-MM= in my inventory list
there is no way to get the match. I tryied scaping - and = but not works. Any idea? I cannot split the sentence using spaces because sometimes I have composed words separated by space.
import re
def get_start_end(self, sentence, key):
r = re.compile(r'\b(%s)\b' % key, re.I)
m = r.search(question)
start = m.start()
end = m.end()
return start, end
You need to escape the key when looking for a literal string, and make sure to use unambiguous (?<!\w)
and (?!\w)
boundaries:
import re
def get_start_end(self, sentence, key):
r = re.compile(r'(?<!\w){}(?!\w)'.format(re.escape(key)), re.I)
m = r.search(question)
start = m.start()
end = m.end()
return start, end
The r'(?<!\w){}(?!\w)'.format(re.escape(key))
will build a regex like (?<!\w)abc\.def\=(?!\w)
out of abc.def=
keyword, and (?<!\w)
will fail any match if there is a word char immediately to the left of the keyword and (?!\w)
will fail any match if there is a word char immediately to the right of the keyword.