Search code examples
javaregexfilefilter

Escape a part of the full path from regex while keeping one part of it


Need

I have an archive of folders which looks like this:

C:\Users\myUser\myArchive\.
├───v1.ci
│   └───Linux
│       ├───111-001
│       └───222-ci
├───v1.dev
│   └───Linux
│       ├───111-001
│       ├───222-001
│       └───333-001
├───v2.ci
│   └───Linux
│       ├───111-001
│       └───222-ci
├───v2.dev
│   └───Linux
│       ├───111-001
│       ├───222-001
│       └───333-001
└───v2.safe
    └───Linux
        ├───111-001
        └───222-ci

I want to make a static function in Java which, given an archive path (in this example the location C:\Users\myUser\myArchive\) and a pattern, returns a List<String> with all the folders matching that pattern.

For example, if I was to say setupsArchive = C:\Users\myUser\myArchive\ and pattern = v*.ci, then the list should be made of v1.ci and v2.ci (the two folders matching this pattern).

Note: no need for recursion. I only care about the names of the folders right below my archive, I don't care what's inside them.

Code working, but only for Linux

This function works when run in an Unix environment:

private static List<String> getVersionsMatchingPattern(String pattern, String setupsArchive) {
    File allVersions = new File(setupsArchive);
    FileFilter versionFilter = pathname -> pathname.isDirectory() && pathname.toString().matches(setupsArchive + pattern);
    File[] filteredVersions = allVersions.listFiles(versionFilter);
    List<String> matchedVersions = new ArrayList<>();
    for (File version : filteredVersions) {
        matchedVersions.add(version.getName());
    }
    matchedVersions.sort(Collections.reverseOrder());
    return matchedVersions;
}

However, when I run it on Windows, it raises an exception on this line:

FileFilter versionFilter = pathname -> pathname.isDirectory() && pathname.toString().matches(setupsArchive + pattern);

The exception is a java.util.regex.PatternSyntaxException: Illegal/unsupported escape sequence near index 3, and it occurs because (unlike Unix) in Windows the path separator is the backslash, \, and when I send C:\Users\... the \u is interpreted as Regex which is illegal in the pathname.toString().matches(setupsArchive + pattern) part.

My attempts to make it work under Windows

I have understood that I need to escape the setupsArchive part of my regex expression, and keep the match() only with the pattern part.

Hence I've tried to:

1. Put the setupsArchive around Pattern.quote():

FileFilter versionFilter = pathname -> pathname.isDirectory() && pathname.toString().matches(Pattern.quote(setupsArchive) + pattern);

2. Apply the regex match only to the basename of the analyzed folder:

FileFilter versionFilter = pathname -> pathname.isDirectory() && pathname.getName().matches(pattern);

In both cases, the code compiles and executes fine, but it doesn't filter anything (i.e. the list is returned empty even though there are data matching the pattern).

Does anyone have any idea?


Solution

  • You can leverage Pattern#asPredicate() as a filter for names.

    File#getName() will return the name of the directory (without the full path).

    You can filter files by type (dir/file) and then filter again the result or you can transform file to names and then filter.

    final Pattern rx = Pattern.compile("AB"); // Matches names wich contain 'AB'
    
    File baseDir = new File("C:\\Users\\myUser\\myArchive\\");
    Predicate<String> nameMatcher = rx.asPredicate();
    
    // this will result in a list of File
    List<File> result = Arrays.stream(baseDir.listFiles())
        .filter(f->f.isDirectory())
        .filter(f->nameMatcher.test(f.getName()))
        .collect(Collectors.toList());
    
    System.out.println(result); // [C:\Users\myUser\myArchive\ABC003PR, C:\Users\myUser\myArchive\TAB113]
    
    
    // this will result in a list of String 
    List<String> result2 = Arrays.stream(baseDir.listFiles())
            .filter(f->f.isDirectory())
            .map(File::getName)
            .filter(nameMatcher)
            .collect(Collectors.toList());
    System.out.println(result2); // [ABC003PR, TAB113]