regex make group appear only once

I am trying to run a regex query on Python and I have the following problem:

In french, subjects of a sentence can appear before and after the verb. For example, the sentence "she says" can be translated into "elle dit" and "dit-elle", where "elle" is "she" and "dit" is "says".

is it possible to capture only sentences containing "elle" and "dit", whether the subject "elle" is before or after the verb "dit" ? I have started with the following:

(elle).{0;10}(dit).{0;10}(elle)

But now I would like to make one of the (elle) optional when the other has been found. The * and + operators does not help in this case.

Solution

You can use PyPi regex module that can be installed using pip install regex (or pip3 install regex):

import regex
p = r'(?<=\b(?P<subject>il|elle)\b.{0,10})?\b(?P<predicate>dit|mange)\b(?=.{0,10}\b(?P<subject>il|elle)\b)?'
print( [x.groupdict() for x in regex.finditer(p, 'elle dit et dit-elle et il mange ... dit-il', regex.S)])

See the online Python demo

The pattern may be created dynamically from variables:

subjects = ['il', 'elle']
predicates = ['dit', 'mange']
p = fr'(?<=\b(?P<subject>{"|".join(subjects)})\b.{0,10})?\b(?P<predicate>{"|".join(predicates)})\b(?=.{0,10}\b(?P<subject>{"|".join(subjects)})\b)?'

Details

(?<=\b(?P<subject>il|elle)\b.{0,10})? - an optional look back to grab a whole word il or elle within 0 to 10 chars from
\b(?P<predicate>dit|mange)\b - a whole word dit or mange
(?=.{0,10}\b(?P<subject>il|elle)\b)? - an optional look forward to grab a whole word il or elle within 0 to 10 chars from the predicate.