I need to extract the word local when it comes before gun store. But, the below function is not returning it because of using split. Is there any way to get around this?
Source looks like this: As reported on 30 December 2019, in Maipu, Metropolitan region, a group of at least 10 rioters attempted to loot a local gun store.
Here is the function:
regex_filter = r'local|dozen|several|looted'
property_key = r"\b(gun store|establishments|supermarket)\b"
source= source.split()
for i, w in enumerate(source):
if (re.search(property_key, w)):
if re.match(re.compile(regex_filter, flags=re.IGNORECASE), source[i-1]):
return source[i-1]```
I suggest extracting the word preceding any of the words listed in property_key
with
re.search(r"(\S+)\s+(?:gun store|establishments|supermarket)\b", text)
Or, if the word is formed with word chars and there can be any whitespace/punctuation between the words:
re.search(r"([^\W_]+)[\W_]+(?:gun store|establishments|supermarket)\b", text)
See the regex demo.
The (\S+)\s+
matches and captures one or more non-whitespace chars into Group 1 and then matches one or more whitespace chars, while ([^\W_]+)[\W_]+
matches and captures one or more letters or digits into Group 1 and then one or more non-word or underscore chars are matched.
See the Python demo:
import re
rx = r"(\S+)\s+(?:gun store|establishments|supermarket)\b"
text = "As reported on 30 December 2019, in Maipu, Metropolitan region, a group of at least 10 rioters attempted to loot a local gun store."
m = re.search(rx, text)
if m:
print(m.group(1))
# => local