import re
name = "John"
#In these examples it works fine
input_sense_aux = "These sound system are too many, I think John can help us, otherwise it will be waiting for a while longer"
#input_sense_aux = "These sound system are too many but I know that John can help us, otherwise it will be waiting for a while longer"
#input_sense_aux = "These sound system are too many but I know that John can help us. otherwise it will be waiting for a while longer"
#input_sense_aux = "Do you know if John with the others could come this afternoon?"
#In these examples it does not work well
#input_sense_aux = "John can help us, otherwise it will be waiting for a while longer"
#input_sense_aux = "Can you help us, otherwise it will be waiting for a while longer for John"
#input_sense_aux = "sorry! can you help us? otherwise it will be waiting for a while longer for John"
regex_patron_m1 = r"\s*((?:\w\s*)+)\s*?" + name + r"\s*((?:\w\s*)+)\s*\??"
m1 = re.search(regex_patron_m1, input_sense_aux, re.IGNORECASE) #Con esto valido la regex haber si entra o no en el bloque de code
if m1:
something_1, something_2 = m1.groups()
something_1 = something_1.strip()
something_2 = something_2.strip()
print(repr(something_1))
print(repr(something_2))
I need the regex to grab the content before "John" like this:
(start of sentence|¿|¡|,|;|:|(|[|.) \s* "content for something_1" \s* John
And then:
John \s* "content for something_2" \s* (end of sentence|?|!|,|;|:|)|]|.)
In the fists examples, the regex works fine:
'these teams are too many but I know that'
'can help us'
'Do you know if'
'with the others could come this afternoon'
But with the cases of the last 3 examples the regex does not return anything
And I need help to be able to generalize my regex to all these cases and at the same time respect the conditions in which it must extract the content of something_1
and something_2
For the 3 last examples, the expected results are:
''
' can help us'
' otherwise it will be waiting for a while longer for '
''
' otherwise it will be waiting for a while longer for '
''
You can use
import re
name = "John"
input_sense_auxs = [
"These sound system are too many, I think John can help us, otherwise it will be waiting for a while longer",
"These sound system are too many but I know that John can help us, otherwise it will be waiting for a while longer",
"These sound system are too many but I know that John can help us. otherwise it will be waiting for a while longer",
"Do you know if John with the others could come this afternoon?",
"John can help us, otherwise it will be waiting for a while longer",
"Can you help us, otherwise it will be waiting for a while longer for John",
"sorry! can you help us? otherwise it will be waiting for a while longer for John"]
regex_patron_m1 = fr'(?:^|[?!¿¡,;:([.])\s*(?:(\w+(?:\s+\w+)*)\s*)?{name}(?:\s*(\w+(?:\s+\w+)*))?\s*(?:$|[]?!,;:).])'
# r"\s*((?:\w\s*)+)\s*?" + name + r"\s*((?:\w\s*)+)\s*\??"
for input_sense_aux in input_sense_auxs:
print(f'--- {input_sense_aux} ---')
m1 = re.search(regex_patron_m1, input_sense_aux, re.IGNORECASE) #Con esto valido la regex haber si entra o no en el bloque de code
if m1:
something_1, something_2 = m1.groups()
something_1 = something_1.strip() if something_1 else ""
something_2 = something_2.strip() if something_2 else ""
print(repr(something_1))
print(repr(something_2))
Output:
--- These sound system are too many, I think John can help us, otherwise it will be waiting for a while longer ---
'I think'
'can help us'
--- These sound system are too many but I know that John can help us, otherwise it will be waiting for a while longer ---
'These sound system are too many but I know that'
'can help us'
--- These sound system are too many but I know that John can help us. otherwise it will be waiting for a while longer ---
'These sound system are too many but I know that'
'can help us'
--- Do you know if John with the others could come this afternoon? ---
'Do you know if'
'with the others could come this afternoon'
--- John can help us, otherwise it will be waiting for a while longer ---
''
'can help us'
--- Can you help us, otherwise it will be waiting for a while longer for John ---
'otherwise it will be waiting for a while longer for'
''
--- sorry! can you help us? otherwise it will be waiting for a while longer for John ---
'otherwise it will be waiting for a while longer for'
''
See the Python demo.
Details:
(?:^|[?!¿¡,;:([.])\s*(?:(\w+(?:\s+\w+)*)\s*)?
- the prefix, the left-hand side part, that matches
(?:^|[?!¿¡,;:([.])
- either start of string or a char from the ?!¿¡,;:([.
set\s*
- zero or more whitespaces(?:(\w+(?:\s+\w+)*)\s*)?
- an optional occurrence of
(\w+(?:\s+\w+)*)
- Group 1: one or more word chars and then zero or more sequences of one or more whitespaces and one or more word chars\s*
- zero or more whitespacesJohn
- the name(?:\s*(\w+(?:\s+\w+)*))?\s*(?:$|[]?!,;:).])
- the right-hand part:
\s*
- zero or more whitespaces(\w+(?:\s+\w+)*))?
- Group 2: an optional sequence of one or more word chars and then zero or more occurrences of one or more whitespaces followed with one or more word chars\s*
- zero or more whitespaces(?:$|[]?!,;:).])
- end of string or a char from the ]?!,;:).
charset.See the regex demo.