Search code examples
javascriptregexregex-lookaroundsregex-group

Struggling with a regex for matching inner quote+parenthesis. Do I need negative/positive look-ahead/behind?


I'm trying to perform a regex on following strings:

  1. "sonoma wildfires"
  2. sonoma and (wild* or stratus or kincade)
  3. sonoma and (wild or "stratus kincade")

... so that I get the following matches:

  1. ['"sonoma wildfires"']
  2. ['sonoma', 'and', '(wild* or stratus or kincade)']
  3. ['sonoma', 'and', '(wild* or "stratus or kincade")']

I'm using the following regex:

/\w+\*?|["(][^()"]+[")]/g

The first two strings match correctly.

But with the third string, I get this match:

['sonoma', 'and', '(wild* or "', 'stratus', 'kincade']

... and what I want is:

['sonoma', 'and', '(wild* or "stratus or kincade")']

It's matching the first inner parenthesis but also grabbing the first inner quote. I've been tweaking the regex with negative and positive look-aheads but I having trouble figuring it out.

/\w+\*?|["(](?<!\()[^()"]+(?!\))[")]/g


Solution

  • The first pattern that you tried \w+\*?|["(][^()"]+[")] does not give the desired match because the second part of the alternation first matches any of the listed chars ["(] and it can match (

    Then the next part [^()"]+ matches one or more occurrences of any char except the listed. The match will not reach the closing parenthesis because it can not cross the double quote inside the third example which is present in the negated character class.


    You don't need any lookarounds, you can add a third alternative to the alternation.

    \w+\*?|\([^()]+\)|"[^"]+"
    

    Explanation

    • \w+\*? Match 1+ word chars and optional *
    • | Or
    • \([^()]+\) Match from opening till closing parenthesis using a negated character class
    • | Or
    • "[^"]+" Match from double quote to double quote using a negated character class

    Regex demo

    [
      `sonoma wildfires"`,
      `sonoma and (wild* or stratus or kincade)`,
      `sonoma and (wild or "stratus kincade")`,
    ].forEach(s => console.log(s.match(/\w+\*?|\([^()]+\)|"[^"]+"/g)));