Search code examples
regexpcre

Regex return only non empty after substitution


Given this text string of ; delimited columns:

a;; z
z;d;hh 
d;23
;;io;
b;b;12

a;b;bb;;;34

This regex

^(?:(a|b|z)(?:;|$)([^;\r\n]*)(?:;|$)([^;\r\n]*)(?:;.*)?|.*)$

with this substitution $3 will return the 3rd column, if it exists, from lines whose first column is a, b or z, as shown in this demo

My question is, how to return only the non-empty lines, as in:

    z
hh 
12
bb

Thanks for any help


Solution

  • You may still do that with a plain regex: you just need to re-arrange your two alternatives in the regex pattern and add an optional line break pattern at the end.

    Your pattern has the ^((O)(N)(E)|.*)$ structure, so the second alternative matches the whole line if the first one does not match, but both alternatives will stop at the line end (you are using the multiline flag, so $ matches all positions before a line break char or end of string). So, you need to convert it to ^(?:(O)(N)(E)$|.*$\R?):

    ^(?:(a|b|z)(?:;|$)([^;\r\n]*)(?:;|$)([^;\r\n]*)(?:;.*)?$|.*$\R?)
                                                           ^^^^^^^^^
    

    See the regex demo, in the regex101 tester, note the use of g and m modifiers.

    So, in general, the pattern is

    • ^ - start of a line
    • (?: - start of a non-capturing group (so that ^ could be applied to both alternatives):
      • (a|b|z)(?:;|$)([^;\r\n]*)(?:;|$)([^;\r\n]*)(?:;.*)?$ - your specific pattern capturing necessary substrings, up to the end of line/string
      • | - or
      • .*$ - any 0+ chars other than line break chars, as many as possible, up to the line/string end ($), and then
      • \R? - an optional line break sequence
    • ) - end of the group.