Search code examples
regexperlpcre

perl regex to get comma not in parenthesis or nested parenthesis


I have a comma separated string and I want to match every comma that is not in parenthesis (parenthesis are guaranteed to be balanced).

a   ,   (b)  ,   (d$_,c)    ,     ((,),d,(,))

The commas between a and (b), (b) and (d$,c), (d$,c) and ((,),d,(,)) should match but not inside (d$_,c) or ((,),d,(,)).

Note: Eventually I want to split the string by these commas.

It tried this regex: (?!<(?:\(|\[)[^)\]]+),(?![^(\[]+(?:\)|\])) from here but it only works for non-nested parenthesis.


Solution

  • A single regex for this is massively overcomplicated and difficult to maintain or extend. Here is an iterative parser approach:

    use strict;
    use warnings;
    
    my $str = 'a   ,   (b)  ,   (d$_,c)    ,     ((,),d,(,))';
    
    my $nesting = 0;
    my $buffer = '';
    my @vals;
    while ($str =~ m/\G([,()]|[^,()]+)/g) {
      my $token = $1;
      if ($token eq ',' and !$nesting) {
        push @vals, $buffer;
        $buffer = '';
      } else {
        $buffer .= $token;
        if ($token eq '(') {
          $nesting++;
        } elsif ($token eq ')') {
          $nesting--;
        }
      }
    }
    push @vals, $buffer if length $buffer;
    
    print "$_\n" for @vals;
    

    You can use Parser::MGC to construct this sort of parser more abstractly.