Search code examples
regexpcreregex-lookarounds

Regex get column


Given the following ; delimited string:


a;; z
toy;d;hh 
z;
d;23
;;io;
b y;b;12

a;b;bb;;;34

I am looking to get the 3rd column, if it exist, of any line whose 1st colum is not d, b y or toy.

So the desired result would be


 z

io

bb

I Have this regex so far:

^(?!(d|b y|toy);([^;\r\n]*);([^;\r\n]*)).*\R

as shown in this demo

As I see it there are at least 2 issues:

the 5th line that contains d in the first column is matching, and it should not

the matches are not returning groupings

Any help will be appreciated


Solution

  • I think you may use

    ^(?:([^\r\n;]*)(?:;(?!(?:23|b)(?=;|$))([^\r\n;]*)(?:;([^\r\n;]*))?.*)?|.*)$
    

    See the regex demo

    • ^ - start of string
    • (?:([^\r\n;]*)(?:;(?!(?:23|b)(?=;|$))([^\r\n;]*)(?:;([^\r\n;]*))?.*)?|.*) - a non-capturing group:
      • ([^\r\n;]*)(?:;(?!(?:23|b)(?=;|$))([^\r\n;]+)(?:;([^\r\n;]*))?.*)? -
        • ([^\r\n;]*) - Group 1: 0+ chars other than LF, CR and ;
        • (?:;(?!(?:23|b)(?=;|$))([^\r\n;]+)(?:;([^\r\n;]*))?.*)? - an optional non-capturing group:
        • ; - a semi-colon
        • (?!(?:23|b)(?=;|$))([^\r\n;]*) - 0 or more chars other than CR, LF and ; but not equal to 23 or b
        • (?:;([^\r\n;]*))? - an optional non-capturing group matching ; and then capturing into Group X 0+ chars other than LF, CR and ;
        • .* - any 0+ chars other than line break chars, as many as possible
      • | - or
      • .* - any 0+ chars other than line break chars, as many as possible
    • $ - end of string.