How can I replace this expression in chain regex (notepad++)?

i have this text

14 two 25 three 12 four 40 five 10

I want to obtain "14 two 14 25 three 14 25 12 four 14 25 12 40 five 14 25 12 40 10"

For example, when I replace (14 two ) for (14 two 14 ) this start after of 14 I can't start it after two.

Is there any other alternative to do? For example using a group that is not included in match ( a group before match ) for replace it ?

please help me

Solution

This should do the trick for you:

Regex: ((?:\s?\d+\s?)+)((?:[a-zA-Z](?![^a-zA-Z]+\1))+)

Replacement: $1$2 $1

You will need to click on the "replace all" button for this to work (it cannot be done in one shot, it has to be repeated as long as it can find match. Online PHP example)

Explanation:

\s: Match a single space character
?: the previous expression must be matched 0 or 1 time.
\s?: Match a space character 0 or 1 time.
\d: Match a digit character (the equivalent of [0-9]).
+: The previous expression must be matched at least one time (u to infinite).
\d+: Match as much digit characters as you (but at least one time).
(): Capture group
(?:): Non-capturing group
((?:\s?\d+\s?)+): Match an optional space character followed by one or more digit characters followed by an optional space character. The expression is surrounded by a non-capturing group followed by a plus. That mean that the regex will try to match as much combination of space and digit character as it can (so you can end up with something like '14 25 12 40').

The capture group is meant to keep the value to reuse it in the replacement.You cannot simply add the plus at the end of the capture group without the non-capturing group within because it would only remember the last digits capture ('12' instead of the whole '14 25 12' use to build '14 25 12 40').

[a-zA-Z]: Match any English letters in any case (lower, upper).
\1: reference to what have been capture in the first group.
(?!): Negative lookahead.
[^]: Negative character class, so [^a-zA-Z] means match anything
((?:[a-zA-Z](?![^a-zA-Z]+\1))+): The negative lookahead is meant to make sure that we don't always end up matching the first "14 two" in the input text. Without it, we would end up in an infinite loop giving results as "14 two 14 14 14 14 14 14 25 three 12 four 40 five 10" (the "14" before "25" being repeated until you reach the timeout).

Basically, for every English letter we match, we lookahead to assert that the content of the first capture group (by example "14") is not present in our digit sequence.

For the replacement, $1$2 $1 means put the content of the capture group 1 and 2, add a space and put the content of the capture group 1 once more.