Search code examples
javascriptregexparentheses

why do nested parentheses cause empty strings in this regex?


Why do nested parentheses cause empty strings in this regex?

var str = "ab((cd))ef";
var arr = str.split(/([\)\(])/);
console.log(arr); // ["ab", "(", "", "(", "cd", ")", "", ")", "ef"] 

what I want to achieve is this

["ab", "(", "(", "cd", ")", ")", "ef"] 

Solution

  • The outer parameters in your regular expression act as capturing group. From the documentation of split (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/split):

    If separator is a regular expression that contains capturing parentheses, then each time separator is matched, the results (including any undefined results) of the capturing parentheses are spliced into the output array.

    You didn't say exactly what you want to achieve with your regex, perhaps you want something like this:

    var str = "ab((cd))ef";
    var arr = str.split(/[\)\(]+/);
    console.log(arr); // ["ab", "cd", "ef"] 
    

    EDIT:

    Each parenthesis matches the regex individually, so the array looks like this (one line per parenthesis matched:

    ['ab', '('] // matched (
    ['ab', '(', '', '('] // matched ( (between the last two matches is the empty string
    ['ab', '(', '', '(', 'cd', ')'] // matched )
    ['ab', '(', '', '(', 'cd', ')', '', ')'] // matched )
    ['ab', '(', '', '(', 'cd', ')', '', ')', 'ef'] // string end
    

    EDIT2:

    Required output is: ["ab", "(", "(", "cd", ")", ")", "ef"]

    I am not sure you can do that with one split. The fastest and safest way to do it is to just filter out the empty strings. I doubt a solution with a single split for a regexp exists.

    var str = "ab((cd))ef";
    var arr = str.split(/([\)\(])/).filter(function(item) { return item !== '';});
    console.log(arr);