Search code examples
javaregexregex-group

Expand Regex SubGroup Matches in Java


I have a function that takes a regex and another string and returns a lambda that matches the regex against whatever input. I want the other string to be able to use matching groups from the regex, but I can't find a way to do this in Java.

Minimal example:

public static Function<String, AbstractMap.SimpleEntry> process(String regex, String template){
  return input -> {
    Matcher m = Pattern.compile(regex).matcher(input);
    if(!m.find()){
      return null;
    }
    // Want something like:
    // String key = m.expand(template);
    // That *only* expands template and doesn't add anything else.
        
    // **Doesn't work**
    // m.replaceFirst/appendReplacement keep parts of the original input
    String key = m.replaceFirst(template);
        
    return Map.entry(key, input);
  };
}
    
public static void main (String[] args) throws Exception {
  String text = "https://www.myaddress.com?x=y&w=z&other=other&q=taco&temp=1";
  Function<String, AbstractMap.SimpleEntry> func1 = process("myaddress.com.*[?&]q=(\\w+)", "$1");
  Function<String, AbstractMap.SimpleEntry> func2 = process("myaddress.com.*[?&]q=(?<query>\\w+)", "query: ${query}");
  System.out.println(func1.apply(text).getKey());
  // Outputs "https://www.taco&temp=1" want "taco"
  System.out.println(func2.apply(text).getKey());
  // Outputs "https://www.query: taco&temp=1" want "query: taco"
}

This example only uses a single capture group, but regex/template could be anything and we should support it generically (e.g. process should handle $1 $4 ${mygroup} as a template for a compatible regex). Forcing the user to match the entire URL is also undesirable.

Golang has an Expand function for this, how can I achieve it in Java without reimplementing the parsing of $ capture group syntax?

The best workaround I currently have is simply to prepend and append .* to the regex string when compiling it.


Solution

  • Answering my question with my workaround as it seems the most elegant solution available.


    There is currently no method for this in Java; the best workaround is to expand the match to the entire string.

    Expanding the match looks a little different depending on which of Java's three match operations you're using:

    • matches - This already matches against the full string; a match will necessarily include all input.
    • lookingAt - Matches from start of input, append .* to your regex to include the full input in your match1.
    • find - Matches any substring in the input, append and prepend .* to your regex to match the full input.

    Then you can use the original solution with replaceFirst and it will work as expected.

    Note that since you're expanding the match to the whole input, once you do so, any match operation becomes equivalent to using matches.

    1 Equivalently, in this case you can use appendReplacement without calling appendTail instead of expanding the match.