Search code examples
javafilesystemsportability

Check if files under a root are named in a portable way


I want to check if all the files in a given folder have portable names or if they have some unfortunate names that may make impossible to represent the same file structure on various file systems; I want to at least support the most common cases. For example, on Windows, you can not have a file called aux.txt, and file names are not case sensitive. This is my best attempt, but I'm not an expert in operative systems and file systems design. Looking on wikipedia, I've found 'incomplete' lists of possible problems... but... how can I catch all the issues? Please, look to my code below and see if I've forgotten any subtle unfortunate case. In particular, I've found a lot of 'Windows issues'. Is there any Linux/Mac issue that I should check for?

class CheckFileSystemPortable {
  Path top;
  List<Path> okPaths=new ArrayList<>();
  List<Path> badPaths=new ArrayList<>();
  List<Path> repeatedPaths=new ArrayList<>();

  CheckFileSystemPortable(Path top){
    assert Files.isDirectory(top);
    this.top=top;

    try (Stream<Path> walk = Files.walk(top)) {//the first one is guaranteed to be the root
      walk.skip(1).forEach(this::checkSystemIndependentPath);
    } catch (IOException e) {
      throw new Error(e);
    }

    for(var p:okPaths) {
      checkRepeatedPaths(p);
    }

    okPaths.removeAll(repeatedPaths);
  }

  private void checkRepeatedPaths(Path p) {
    var s=p.toString();
    for(var pi:okPaths){
      if (pi!=p && pi.toString().equalsIgnoreCase(s)) {
        repeatedPaths.add(pi);
      }
    }
  }

//incomplete list from wikipedia below:
//https://en.wikipedia.org/wiki/Filename#Reserved_characters_and_words
  private static final List<String>forbiddenWin=List.of(
    "CON", "PRN", "AUX", "CLOCK$", "NUL",
    "COM0", "COM1", "COM2", "COM3", "COM4", "COM5", "COM6", "COM7", "COM8", "COM9",
    "LPT0", "LPT1", "LPT2", "LPT3", "LPT4", "LPT5", "LPT6", "LPT7", "LPT8", "LPT9",
    "LST", "KEYBD$", "SCREEN$", "$IDLE$", "CONFIG$", 
    "$Mft", "$MftMirr", "$LogFile", "$Volume", "$AttrDef", "$Bitmap", "$Boot",
    "$BadClus", "$Secure", "$Upcase", "$Extend", "$Quota", "$ObjId", "$Reparse"
    );

  private void checkSystemIndependentPath(Path path) {
    String lastName=path.getName(path.getNameCount()-1).toString();
    String[] parts=lastName.split("\\.");

    var ko = forbiddenWin.stream()
        .filter(f -> Stream.of(parts).anyMatch(p->p.equalsIgnoreCase(f)))
        .count();

    if(ko!=0) {
      badPaths.add(path);
    } else {
      okPaths.add(path);
    }
  }
}

Solution

  • If I understand your question correctly and by reading the Filename wikipedia page, portable file names must:

    • Be posix compliant. Eg. alpha numeric ascii characters and _, -
    • Avoid windows and DOS device names.
    • Avoid NTFS special names.
    • Avoid special characters. Eg. \, |, /, $ etc
    • Avoid trailing space or dot.
    • Avoid filenames begining with a -.
    • Must meet max length. Eg. 8-bit Fat has max 9 characters length.
    • Some systems expect an extension with a . and followed by a 3 letter extension.

    With all that in mind checkSystemIndependentPath could be simplified a bit, to cover most of those cases using a regex.

    For example, POSIX file name, excluding special devices, NTFS, special characters and trailing space or dot:

    private void checkSystemIndependentPath(Path path){
        String reserved = "^(CON|PRN|AUX|NUL|COM[1-9]|LPT[1-9])(\\..*)*$";
        String posix = "^[a-zA-Z\\._-]+$";
        String trailing = ".*[\s|\\.]$";
        int nameLimit = 9;
    
        String fileName = path.getFileName().toString();
    
        if (fileName.matches(posix) &&
                !fileName.matches(reserved) &&
                !fileName.matches(trailing) &&
                fileName.length() <= nameLimit) {
            okPaths.add(path);
        } else {
            badPaths.add(path);
        }
    }
    

    Note that the example is not tested and doesn't cover edge conditions. For example some systems ban dots in a directory names. Some system will complain about multiple dots in a filename.