Problem with long named capturing groups regex patterns

I've created and initially validated the following regex. It is intended to be a part of tokenizer recognizing and grouping certain patterns before passing it to parser.

(?<FUNCTION> (?!CUP\(\d+\)|CDN\(\d+\))[A-Z]+\(\d*\)|[A-Z]+\(\d+,\d+\))
| (?<NUMBER>\d+|\d*\.\d+)
| (?<RELATION>==|<=|>=|!=|<|>|CUP\(\d+\)|CDN\(\d+\))
| (?<EOL>;)
| (?<OPENPAR>\()
| (?<CLOSEPAR>\))
| (?<OPERATION>\*|\+|-|\/])
| (?<SPACE>\s+)
| (?<ERROR>.)

With the above, using regex engine properties together with decreasing complexity of subpatterns, I am able to catch all groups correctly. Below sample text which matches as expected, catching correct parts within named capture groups (on regex101: https://regex101.com/r/nvRyjt/2). So far so good.

TICK()<=.(2*STDEVEMA(14))+EMA(14);
TREND(14,2)==UP(2);
TICK()>EMA(14);
TREND(14)==UP(1);
EMA(14)CUP(2)EMA(28);
EMA(14)CDN(2)EMA(28)
2*3==6;
ssst2222  \\\\///???
sfgjsf

Problem starts when I try to use this expression in java.util.regex. Pattern catches only few occurences for smaller groups and remaining recognizable parts of entry text are skipped or presented as 'nulls'. I have tried many combinations including limiting the pattern to two or three groups, but with no clear conclusion on what leads to unexpected behaviour. Searching through questions posted so far I notice clearly that regex101(and PCRE in general) are not a good tool to validate regexes used later in java :-). While posting this I would like to ask the following:

Is any of you aware of any in-depth description on how Pattern and Matcher classes work (especially regex engine)?
Did you have similar problems with more than average complicated regex patterns (maybe there are certain not that obvious limitations of java.util)?

One more specific thing with negative lookahead. Construction like this:

(?<FUNCTION> (?!CUP\(\d+\)|CDN\(\d+\))[A-Z]+\(\d*\)|[A-Z]+\(\d+,\d+\))

skips only first character 'C' presenting remaining part of keyword expected to be skipped (CUP or CDN) as valid (UP or DN in this case). Any thoughts on this? Thank you in advance !

Solution

I am not sure If I can answer your questions, but you could try the following.

Use the code generator tool on regex101 site (on the left side under Tools) and use java as your language.

Remove the \n from the generated regex string. Not removing it might caused your problems?!

package com.company;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {

    public static void main(String[] args) {

        final String regex = "(?<FUNCTION> (?!CUP\\(\\d+\\)|CDN\\(\\d+\\))[A-Z]+\\(\\d*\\)|[A-Z]+\\(\\d+,\\d+\\))" +
                " | (?<NUMBER>\\d+|\\d*\\.\\d+)" +
                " | (?<RELATION>==|<=|>=|!=|<|>|CUP\\(\\d+\\)|CDN\\(\\d+\\))" +
                " | (?<EOL>;)" +
                " | (?<OPENPAR>\\()" +
                " | (?<CLOSEPAR>\\))" +
                " | (?<OPERATION>\\*|\\+|-|\\/])" +
                " | (?<SPACE>\\s+)" +
                " | (?<ERROR>.)";
        final String string = "TICK()<=.(2*STDEVEMA(14))+EMA(14);\n"
                + "TREND(14,2)==UP(2);\n"
                + "TICK()>EMA(14);\n"
                + "TREND(14)==UP(1);\n"
                + "EMA(14)CUP(2)EMA(28);\n"
                + "EMA(14)CDN(2)EMA(28)\n"
                + "2*3==6;\n"
                + "ssst2222  \\\\\\\\///???\n"
                + "sfgjsf\n";

        final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE | Pattern.COMMENTS);
        final Matcher matcher = pattern.matcher(string);

        while (matcher.find()) {
            System.out.println("Full match: " + matcher.group(0));
            for (int i = 1; i <= matcher.groupCount(); i++) {
                System.out.println("Group " + i + ": " + matcher.group(i));
            }
        }
    }
}

BEWARE

I got 67 matches, regex101 got 89 matches.

This is because SPACE is matched before ERROR in regex101. In Java it is not. Could be the Extended Pattern supported by regex101, which seems not available in Java.

Example:

final String string = "TICK()<=."

gives in Java:

Full match: TICK()
Full match: <=
Full match: .

gives in regex101:

Match 1  Full match 0-6 TICK()
Match 2  Full match 6-8 <= 
Match 3  Full match 8-8  
Match 4  Full match 8-9 .