I'm trying to write a regular expression that captures n words after a pattern, which was answered in this question, except I want the search to keep going for another n words if it encounters that pattern again. For example, if my main search pattern is 'x', and I want to capture a word that contains 'x' and n=3 words after it that don't contain 'x', the following string should result in three matches:
Lorem ipsum dolxor sit amet, consectetur adipiscing elit. Morxbi fringilla, dui axt tincidunt consectetur, libero arcu cursus arcxu, ut commodo lexctus magna vitxae venenatis neque.
Matches ('x's in bold for ease of viewing)
Matching n=3 words after is straightforward: [^ ]*x[^ ]*(?: [^ ]*){0,3}
How to keep going if another 'x' is encountered, I'm not sure. I've tried this -- [^ ]*x[^ ]*(?: (?![^ ]*x[^ ]*)[^ ]*){0,3}
-- but it terminates the search instead of continuing on check the next n words, which, given the example above, gives six results instead of the expected three:
P.S. I'm working with python.
EDIT: For context, I'm trying to get sufficient information about the surroundings of each appearance of the given pattern. (For simplicity's sake, I'm only including the words after the pattern, but it's easy to generalize to words in the back.) And the problem with the first regex is that it might result in a word that lacks information on its surroundings if it gets picked up as part of the surroundings of another match. For example, the first match given the text above would be 'Morxbi fringilla, dui axt', which gives us information about what comes after 'Morxbi' but not 'axt'. The second regex doesn't help because now matches with another match in its surroundings will lose that information, e.g., we won't know the third word that comes after 'Morxbi'.
Turns out the solution was a lot closer than I thought! Since [^ ]*x[^ ]*(?: (?![^ ]*x[^ ]*)[^ ]*){0,3}
already captures the appropriate strings and the only problem is that it cuts them off instead of joining them (e.g., 'Morxbi fringilla, dui' and 'axt tincidunt consectetur, libero' instead of 'Morxbi fringilla, dui axt tincidunt consectetur, libero'), then the solution would be simply to add a +
to the overall expression (and accounting for the spaces between them) to join them: (?:[^ ]*x[^ ]*(?: (?![^ ]*x[^ ]*)[^ ]*){0,3} ?)+(?<! )
(example)
This solution can also be extended to looking for n words before and m words after the pattern: (?:(?:[^ ]* ){0,n}[^ ]*x[^ ]*(?: (?![^ ]*x[^ ]*)[^ ]*){0,m})+
(example with n=2 and m=3).
Thanks to @bobblebubble for making this suggestion in the comments.