Search code examples
linuxpcre

Basic PCRE expression to omit a double quoted substring within a string


In e.g. the string, 'aaa"bbb"ccc', I am trying to craft a basic PCRE-style regular expression that will return only 'aaa' and 'ccc', so as to leave out the double quoted "bbb". If the string contains no double quotes, e.g. 'aaabbbccc', then the expression should just return that string, 'aaabbbccc'. I am trying the following:

~/ % pcretest
PCRE version 8.45 2021-06-15

  re> "^(.*?)(?:\".*\")?(.*)$"
data> aaa"bbb"ccc
 0: aaa"bbb"ccc
 1:
 2: aaa"bbb"ccc
  
  re> "^(.*?)(?:\".*\")(.*)$"
data> aaa"bbb"ccc
 0: aaa"bbb"ccc
 1: aaa
 2: ccc
data> aaabbbccc
No match

So, the first regex does not get anything I want. The second regex fulfills the first requirement to return 'aaa' and 'ccc', but for the string 'aaabbbccc', it returns "No match". Can someone help me with this?


Solution

  • Try this regex:

    $ pcre2test
    PCRE2 version 10.39 2021-10-29
      re> '^([^"]*)(?:"[^"]*")?(.*)$'
    data> aaa"bbb"ccc
     0: aaa"bbb"ccc
     1: aaa
     2: ccc
    data> aaabbbccc
     0: aaabbbccc
     1: aaabbbccc
     2: 
    data>