Search code examples
javascriptregexregex-greedy

Empty match on end token due to greedyness


I want to parse the space around items of a comma-separated list (the indices are the interesting part for me). Imagine parsing the arguments of a function call.

I'm using the regex ^\s*|\s*,\s*|\s*,?\s*$ for this, which works as intended for most cases. In particular, the beginning and the end should match with empty matches iff there is no whitespace (and/or a comma for the end). E.g. foo has 2 matches, one at 0-0 and one at 3-3.

Unfortunately, non-empty matches in the back are also followed by an empty match right at the end. Consider the following example (at regex):

enter image description here enter image description here

Here, the fifth match (23-23) is unintended. I assume this match is found due to greedy nature of *. However, one cannot use the ? operator on the end token $ to make it non-greedy.

Is there a way to express my intended behavior (without the empty match at the end) using JavaScript regexes?

Edit: here are some examples (using _ instead of spaces for clarity)

  • foo 2 matches 0-0, 3-3
  • foo_,_bar 3 matches 0-0, 3-6, 9-9
  • ___foo,_bar,_no_match,_ 4 matches 0-3, 6-8, 11-13, 21-23
  • foo_,bar_, 3 matches 0-0, 3-5, 8-10
  • _foo_ 2 matches 0-1, 4-5

Solution

  • Add a negative lookahead to the middle alternative so it doesn't match at the end.

    And put a negative lookbehind in the last alternative so you won't get two matches when there's whitespace at the end.

    ^\s*|\s*,\s*(?!\s+$)|(?<![\s,])\s*,?\s*$
    

    DEMO