Search code examples
character-encodingjava-7zosfiletree

How do I use SimpleFileVisitor in Java to find a file name whose encoding may vary?


I'm using SimpleFileVisitor to search for a file. It works fine on Windows and Linux. However when I try using it on Unix like operating systems It doesn't work as expected. I would get errors like this:

java.nio.file.NoSuchFileException:
   /File/Location/MyFolder/\u0082\u0096\u0096âĜu0099\u0081\u0097K
                           \u0097\u0099\u0096\u0097\u0085\u0099Ĝu0089\u0085

It looks like the obtained name is in different character encoding and maybe that is what causing the issue. It looks like in between the obtaining the name and trying to obtain the access to the file, the encoding is getting missed up. This result in calling preVisitDirectory once then visitFileFailed for every file it tries to visit. I'm not sure why the walkFileTree method is doing that. Any idea?

My using for SimpleFileVisitor code looks like this:

 Files.walkFileTree(serverLocation, finder);

My SimpleFileVisitor class:

public class Finder extends SimpleFileVisitor<Path> {    
  private final PathMatcher matcher;
  private final List<Path> matchedPaths = new ArrayList<Path>();
  private String usedPattern = null;
  Finder(String pattern) {
    this.usedPattern = pattern;
    matcher = FileSystems.getDefault().getPathMatcher("glob:" + pattern);
  }

  void match(Path file) { //Compare pattern against file or dir
    Path name = file.getFileName();
    if (name != null && matcher.matches(name))
        matchedPaths.add(file);
  }

  // Check each file.
  @Override
  public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) {
    match(file);
    return CONTINUE;
  }

  // Check each directory.
  @Override
  public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attrs) {
    match(dir);
    return CONTINUE;
  }

  @Override
  public FileVisitResult visitFileFailed(Path file, IOException e) {
    System.out.println("Issue: " + e );
    return CONTINUE;
}

Solution

  • Try using "Charset.defaultCharset()" when you create those "file" and "dir" strings you pass around. Otherwise, you could very likely mangle the names in the process of creating those strings to pass them to your visit methods.

    You might also check your default encoding on the JVM your are running, if it is out of sync with the file system you are reading, your results will be, err, unpredictable.