I have a comma separated string and I want to match every comma that is not in parenthesis (parenthesis are guaranteed to be balanced).
a , (b) , (d$_,c) , ((,),d,(,))
The commas between a and (b), (b) and (d$,c), (d$,c) and ((,),d,(,)) should match but not inside (d$_,c) or ((,),d,(,)).
Note: Eventually I want to split the string by these commas.
It tried this regex:
(?!<(?:\(|\[)[^)\]]+),(?![^(\[]+(?:\)|\]))
from here but it only works for non-nested parenthesis.
A single regex for this is massively overcomplicated and difficult to maintain or extend. Here is an iterative parser approach:
use strict;
use warnings;
my $str = 'a , (b) , (d$_,c) , ((,),d,(,))';
my $nesting = 0;
my $buffer = '';
my @vals;
while ($str =~ m/\G([,()]|[^,()]+)/g) {
my $token = $1;
if ($token eq ',' and !$nesting) {
push @vals, $buffer;
$buffer = '';
} else {
$buffer .= $token;
if ($token eq '(') {
$nesting++;
} elsif ($token eq ')') {
$nesting--;
}
}
}
push @vals, $buffer if length $buffer;
print "$_\n" for @vals;
You can use Parser::MGC to construct this sort of parser more abstractly.