Search code examples
javaregexfilepattern

Java: Converting File Pattern to Regular Expression Pattern


I am trying to make a utility function that converts a file pattern to a java regular expression pattern, I need this to make a wildcard matching of files inside the directory. I came up with 4 cases that needs to be consider. Are the case sufficient enough?

    regexPattern = filePattern;
    // convert windows backslash to slash
    regexPattern = regexPattern.replace("\\", "/");
    // convert dot to \\.
    regexPattern = regexPattern.replace("\\.", "\\\\.z");
    // convert ? wildcard to .+
    regexPattern = regexPattern.replace("?", ".+");
    // convert * wildcard to .*
    regexPattern = regexPattern.replace("*", ".*");

Solution

  • Someone already did this: http://www.rgagnon.com/javadetails/java-0515.html

    As you see other reserved regex characters (described in What special characters must be escaped in regular expressions? i.e. .^$*+?()[{\|) also has to be escaped, not only dot.

    The approach to parse character by character is safer than using String#replace(..) method. In latter case you have to be careful about the order of replacements so that you do not replace something you already did (imagine what happens if in your example you first replace dot with \\. and then windows backslash to slash).

    However, I am afraid the example does not work for all cases. It is because syntax for globs is various across implementations, see wikipedia entry.

    For simple windows cmd patterns the code would be:

    public static String wildcardToRegex(String wildcard){
        StringBuffer s = new StringBuffer(wildcard.length());
        s.append('^');
        for (int i = 0, is = wildcard.length(); i < is; i++) {
            char c = wildcard.charAt(i);
            switch(c) {
                case '*':
                    s.append(".*");
                    break;
                case '?':
                    s.append(".");
                    break;
                case '^': // escape character in cmd.exe
                    s.append("\\");
                    break;
                    // escape special regexp-characters
                case '(': case ')': case '[': case ']': case '$':
                case '.': case '{': case '}': case '|':
                case '\\':
                    s.append("\\");
                    s.append(c);
                    break;
                default:
                    s.append(c);
                    break;
            }
        }
        s.append('$');
        return(s.toString());
    }
    

    This does not handle well escaping of other characters than * and ? (^w should be converted into w and not '\w` which has special meaning in regex) but you can easily improve that.