I have a comma separated numbers, I want to match every item after START
or before END
if any of the keyword exists.
I got most of the test cases correctly using
(?:.*?START|END.*)(*SKIP)(*F)|\d+
except those that START
appears after END
or multiple instances of START
and END
exist.
input | matches |
---|---|
123,45678,789,777,888,1234 |
123 ,45678 ,789 ,777 ,888 ,1234 |
123,START,789,777,888,1234 |
789 ,777 ,888 ,1234 |
123,45678,789,777,END,1234 |
123 ,45678 ,789 ,777 |
123,START,789,777,END,1234 |
789 ,777 |
123,END,789,777,START,1234 |
123 |
123,START,789,START,777,END,1234 |
789 ,777 |
123,START,789,END,777,END,1234 |
789 |
123,END,789,START,777,END,1234 |
123 |
Here's the regex101 project I was trying, I'm using PCRE2(PHP7.3).
You might fix your pattern by adding a restriction to find START
that has no END
before it:
(?:^(?:(?!END).)*?START|END.*)(*SKIP)(*F)|\d+
// ^^^^^^^^^^^^^^^
See the regex demo.
Here, ^(?:(?!END).)*?START
(instead of .*?START
) matches
^
- start of string(?:(?!END).)*?
- any char, other than line break chars, as few as possible, that does not start an END
char sequenceSTART
- a START
char sequence.You can also use
(?:\G(?!\A)|^(?:(?:(?!END).)*?START)?)(?:(?!END).)*?\K\d+
See the regex demo.
Details:
(?:\G(?!\A)|^(?:(?:(?!END).)*?START)?)
- either the end of the previous successful match (\G(?!\A)
) or (|
) start of a string (^
) and then an optional occurrence of any text up to the first occurrence of START
that is not preceded with END
((?:(?:(?!END).)*?START)?
)(?:(?!END).)*?
- any char, other than line break chars, zero or more times but as few as possible, that does not start an END
char sequence\K
- match reset operator that discards all text matched so far from the overall match memory buffer\d+
- one or more digits.