Search code examples
javaregexregex-group

Regex Conditional replacement on Java


I have some numbers on which I want to apply some regex replacement.

+000123,456
+123
000123,456
123
+123.45
-123,45

I want to remove the + sign if present and replace the comma with a dot. The difficult part is that not all the numbers are decimals.

I don't use regex really often, so I usually use regex101 to do some tests.

In this case, I created the following regular expression:

([+]*)([0-9]+)(([,])([0-9]+))*

1st capturing group: ([+]*)
2nd capturing group: ([0-9]+)
3rd capturing group: (([,])([0-9]+))*
4th capturing group: ([,])
5th capturing group: ([0-9]+)

and substitution :

$2${3:+.$5:}

Explanation: use 2nd capturing group, then if 3rd capturing group is present use DOT and 5th capturing group, else nothing

and seems to work fine, but trying to replicate the same on Java, it's not working:

private String replaceUsingRegex(final String line) {
    Pattern regex = Pattern.compile("([+]*)([0-9]+)(([,])([0-9]+))*");
    Matcher regexMatcher = regex.matcher(line);
    return regexMatcher.replaceAll("$2${3:+.$5:}");
}

gived me an IllegalArgumentException at the third line:

java.lang.IllegalArgumentException: named capturing group is missing trailing '}'
    at java.base/java.util.regex.Matcher.appendExpandedReplacement(Matcher.java:1051)
    at java.base/java.util.regex.Matcher.appendReplacement(Matcher.java:997)
    at java.base/java.util.regex.Matcher.replaceAll(Matcher.java:1181)
    at com.mytest.TestRegex.replaceUsingRegex(TestRegex.java:20)

I also tried to use the java generated code by rege101 but is still not working.


Solution

  • "... IllegalArgumentException at the third line ..."

    The reason you're receiving this error is because the Java regex implementation does not recognize the "conditional" replacement syntax of $2${3:+.$5:}.
    The ${ } construct is expected to contain only a literal reference to a capturing group.

    For example, (?<name>abc), and ${name}.

    I believe the syntax you're using is specific to the perl, PCRE, implementation.
    Perldoc – perlre – Perl regular expressions.
    Perldoc – perlretut – Perl regular expressions tutorial.

    Here are the JavaDocs for both the Pattern and Matcher classes.
    You can find the complete syntax specifications on the Pattern JavaDoc page.
    Pattern (Java SE 20 & JDK 20).
    Matcher (Java SE 20 & JDK 20).

    And, a relevant excerpt, Matcher#appendReplacement (Java SE 20 & JDK 20).

    "... The replacement string may contain references to subsequences captured during the previous match: Each occurrence of ${name} or $g will be replaced by the result of evaluating the corresponding group(name) or group(g) respectively. For $g, the first number after the $ is always treated as part of the group reference. Subsequent numbers are incorporated into g if they would form a legal group reference. Only the numerals '0' through '9' are considered as potential components of the group reference. ..."

    There are a few things you can do to parse the values.

    "... I want to remove the + sign if present and replace the comma with a dot. The difficult part is that not all the numbers are decimals. ..."

    If the values will only ever have 1 comma, and a possible + character, you can just use a String#replace call.

    String[] strings = {
        "+000123,456",
        "+123",
        "000123,456",
        "123",
        "+123.45",
        "-123,45"
    };
    for (String string : strings) {
        string = string.replace("+", "").replace(",", ".");
        System.out.println(string);
    }
    

    Output

    000123.456
    123
    000123.456
    123
    123.45
    -123.45
    

    This, of course, would not suffice when the numbers are within a text which also contains unrelated commas and plus signs.

    For this you can use the Pattern and Matcher class, to capture the values, and then append them to a new StringBuilder instance.

    The pattern will be,

    \+?(-?\d+)(?:[,.](\d+))?
    
    String string = "abc, +000123,456 def +123, ghi 000123,456 jkl, 123 mno +123.45, pqr -123,45";
    Pattern pattern = Pattern.compile("\\+?(-?\\d+)(?:[,.](\\d+))?");
    Matcher matcher = pattern.matcher(string);
    StringBuilder stringB = new StringBuilder();
    int offset = 0;
    while (matcher.find()) {
        stringB.append(string, offset, matcher.start());
        stringB.append(matcher.group(1));
        if (matcher.group(2) != null) stringB.append(".").append(matcher.group(2));
        offset = matcher.end();
    }
    

    Output

    abc, 000123.456 def 123, ghi 000123.456 jkl, 123 mno 123.45, pqr -123.45