Search code examples
javascriptregex

How to test if a RegExp contains capturing groups in its definition?


I'm trying to find two RegExp:

  1. One to test whether another RegExp contains capturing groups, like in /ab(c)d/
  2. Same as 1., but only detecting named capturing groups, like /ab(?<name>c)d/

These "meta"-regexes would check the source property of a regex.

Here is my best attempt for 1: /(?<!\\)\((?!\?:)/. The idea is to look for an opening parenthesis not preceded by \ and not followed by ?: (which would make it a non-capturing group). But this has false positives (/c[z(a]d/ for example), and false negatives /a\\(b)/.

My attempt at 2. follows the same logic and has thus the same flaws: /(?<!\\)\(\?<(?![=!])/

Any idea on how to do this properly? Thank you.


Solution

  • You could use a regex that besides spotting capture groups, also captures escape pairs (\\.) and character classes \[(?:\\.|.)*?\] (also aware of escape characters), so to avoid false positives/negatives. Then loop over the matches to spot the good matches.

    The below snippet returns the number of anonymous capture groups and the names of the named capture groups:

    const reParser = /\\.|\[(?:\\.|.)*?\]|(\()(?!\?)|\(\?<([^=!][^>]*)/g;
    function captureGroups(regex) {
        const names = [];
        let numAnonymous = 0;
        for (const [match, anon, name] of regex.source.matchAll(reParser)) {
            if (name) names.push(name);
            else if (anon) numAnonymous++;
        }
        return { numAnonymous, names };
    }
    
    // Example run
    console.log(captureGroups(/test[12\](3]*(?<xy>((\.))?)/g));

    If you only need to know the fact whether there is a capture group, then you could first remove those escape pairs and character classes from the regex and replace them with a single character. Then remains to recognise the capture group pattern:

    function hasCaptureGroups(regex) {
        const simpler = regex.source.replace(/\\.|\[(?:\\.|.)*?\]/g, "x");
        return {
            hasAnonymous: /\([^?]/.test(simpler),
            hasNamed: /\(\?</.test(simpler)
        };
    }
    
    // Example run
    console.log(hasCaptureGroups(/test[12\](3]*(?<xy>((\.))?)/g));

    To get this done with just a regular expression and no replacement, you need to focus on matching an input that does not have the capture group, and then negate that -- that can be done with a negative look-ahead at the very first position, scanning the complete input:

    const reAnonymousGroup = /^(?!(\\.|\[(?:\\.|.)*?\]|[^(]|\(\?)*$)/;
    const reNamedGroup     = /^(?!(\\.|\[(?:\\.|.)*?\]|[^(]|\([^?]|\(\?[^<])*$)/;
    
    // Example run
    const regex = /test[12\](3]*(?<xy>((\.))?)/g;
    console.log("has anonymous group:", reAnonymousGroup.test(regex.source));
    console.log("has named group:", reNamedGroup.test(regex.source));