This question focuses on pcre-regular expression as used by grep -P
.
Imagine I have a string abcRabcSxyxz
and search for a substring which starts with abc
and ends with x
, but with the restriction that no shorter substring of this match would also also match.
My first attempt was a non-greedy regexp,
grep -Po 'abc.*?x' <<<abcRabcSxyxz
but this returns abcRabcSx, while I would like to find just abcSx. It is obvious why even my non-greedy attempt still provides a match which is too long; I need the regexp engine to try harder. My second attempt was
grep -Po '(?>abc.*?)x' <<<abcRabcSxyxz
which did not provide a match at all (maybe I don't really understand the usage of ($?...)
explained here).
Any easy solution for my problem anyone?
UPDATE I see from the comments that my example does not precisely explain what i am searching for, so here a more general description:
I am searching for matches of the form PXQ
, wher P, X and Q are arbitrary patterns, and X should not contain a match of P. Plus, I don't want to literally retype the pattern P inside X.
For instance
`[(][^(]*[)]`
would be a possible (but not satsifying) solution for the concrete case that I am searching for a parenthesized expression which does not contain another parenthesized (here, P is [(], X is an arbitrary string, and Q is [)]), but even this example shows that I have to literally repeat the information contained in P, when specifying the middle part ([^(]*), to make sure that my P is not contained there). I am looking for a way which makes this explicit repetition unnecessary.
Interesting question. Much of this having been worked out in comments, thanks Casimir et Hippolyte, Felix Kling, and user1934428.
The solution uses PCRE and is as follows:
grep -Po '(abc)(?:(?!(?1)).)*?x' <<< abcRabcSxyxz
We know the result will start with "abc" and end in "x". So let us wall through how this result works.
(abc)
to start.(
followed by ?:
prevents the subpattern from capturing or counted.(?!
.abc
)..
matches any character, in this case matching the S
.)*?
, an un-greedy, matching few as zero characters.x
, which the question designated as the ending character.