Search code examples
javacasetruezip

TPath ignore case when accessing file [Java TrueZip]


Is there a way to access the file inside archive while ignoring file name case using TrueZip?

Imagine following zip archive with content:

MyZip.zip
-> myFolder/tExtFile.txt
-> anotherFolder/TextFiles/file.txt
-> myFile.txt
-> anotherFile.txt
-> OneMOREfile.txt

This is how it works:

TPath tPath = new TPath("MyZip.zip\\myFolder\\tExtFile.txt");
System.out.println(tPath.toFile().getName()); //prints tExtFile.txt 

How to do the same but ignore all case, like this:

// note "myFolder" changed to "myfolder" and "tExtFile" to "textfile"    
TPath tPath = new TPath("MyZip.zip\\myfolder\\textfile.txt");
System.out.println(tPath.toFile().getName()); // should print tExtFile.txt

Code above throws FsEntryNotFoundException ... (no such entry)

It works for regular java.io.File, not sure why not for TFile of TrueZip or I am missing something?

My goal is to access each file just using only lowercase for files and folders.

Edit: 24-03-2017

Let's say I would like to read bytes from file inside mentioned zip archive MyZip.zip

Path tPath = new TPath("...MyZip.zip\\myFolder\\tExtFile.txt");
byte[] bytes = Files.readAllBytes(tPath); //returns bytes of the file 

This snippet above works, but this one below does not (throws mentioned -> FsEntryNotFoundException). It is the same path and file just in lowercase.

Path tPath = new TPath("...myzip.zip\\myfolder\\textfile.txt");
byte[] bytes = Files.readAllBytes(tPath);

Solution

  • You said:

    My goal is to access each file just using only lowercase for files and folders.

    But wishful thinking will not get you very far here. As a matter of fact, most file systems (except Windows types) are case-sensitive, i.e. in them it makes a big difference if you use upper- or lower-case characters. There you can even have the "same" file name in different case multiple times in the same directory. I.e. it actually makes a difference if the name is file.txt, File.txt or file.TXT. Windows is really an exception here, but TrueZIP does not emulate a Windows file system but a general archive file system which works for ZIP, TAR etc. on all platforms. Thus, you do not have a choice whether you use upper- or lower-case characters, but you have to use them exactly as stored in the ZIP archive.


    Update: Just as a little proof, I logged into a remote Linux box with an extfs file system and did this:

    ~$ mkdir test
    ~$ cd test
    ~/test$ touch file.txt
    ~/test$ touch File.txt
    ~/test$ touch File.TXT
    ~/test$ ls -l
    total 0
    -rw-r--r-- 1 group user 0 Mar 25 00:14 File.TXT
    -rw-r--r-- 1 group user 0 Mar 25 00:14 File.txt
    -rw-r--r-- 1 group user 0 Mar 25 00:14 file.txt
    

    As you can clearly see, there are three distinct files, not just one.

    And what happens if you zip those three files into an archive?

    ~/test$ zip ../files.zip *
      adding: File.TXT (stored 0%)
      adding: File.txt (stored 0%)
      adding: file.txt (stored 0%)
    

    Three files added. But are they still distince files in the archive or just stored under one name?

    ~/test$ unzip -l ../files.zip
    Archive:  ../files.zip
      Length      Date    Time    Name
    ---------  ---------- -----   ----
            0  2017-03-25 00:14   File.TXT
            0  2017-03-25 00:14   File.txt
            0  2017-03-25 00:14   file.txt
    ---------                     -------
            0                     3 files
    

    "3 files", it says - quod erat demonstrandum.

    As you can see, Windows is not the whole world. But if you copy that archive to a Windows box and unzip it there, it will only write one file to a disk with NTFS or FAT file system - which one is a matter of luck. Very bad if the three files have different contents.


    Update 2: Okay, there is no solution within TrueZIP for the reasons explained in detail above, but if you want to work around it, you can do it manually like this:

    package de.scrum_master.app;
    
    import de.schlichtherle.truezip.nio.file.TPath;
    
    import java.io.IOException;
    import java.net.URISyntaxException;
    import java.nio.file.Files;
    
    public class Application {
      public static void main(String[] args) throws IOException, URISyntaxException {
        TPathHelper tPathHelper = new TPathHelper(
          new TPath(
            "../../../downloads/powershellarsenal-master.zip/" +
              "PowerShellArsenal-master\\LIB/CAPSTONE\\LIB\\X64\\LIBCAPSTONE.DLL"
          )
        );
        TPath caseSensitivePath = tPathHelper.getCaseSensitivePath();
        System.out.printf("Original path: %s%n", tPathHelper.getOriginalPath());
        System.out.printf("Case-sensitive path: %s%n", caseSensitivePath);
        System.out.printf("File size: %,d bytes%n", Files.readAllBytes(caseSensitivePath).length);
      }
    }
    
    package de.scrum_master.app;
    
    import de.schlichtherle.truezip.file.TFile;
    import de.schlichtherle.truezip.nio.file.TPath;
    
    import java.io.IOException;
    import java.net.URISyntaxException;
    import java.nio.file.Path;
    
    public class TPathHelper {
      private final TPath originalPath;
      private TPath caseSensitivePath;
    
      public TPathHelper(TPath tPath) {
        originalPath = tPath;
      }
    
      public TPath getOriginalPath() {
        return originalPath;
      }
    
      public TPath getCaseSensitivePath() throws IOException, URISyntaxException {
        if (caseSensitivePath != null)
          return caseSensitivePath;
        final TPath absolutePath = new TPath(originalPath.toFile().getCanonicalPath());
        TPath matchingPath = absolutePath.getRoot();
        for (Path subPath : absolutePath) {
          boolean matchFound = false;
          for (TFile candidateFile : matchingPath.toFile().listFiles()) {
            if (candidateFile.getName().equalsIgnoreCase(subPath.toString())) {
              matchFound = true;
              matchingPath = new TPath(matchingPath.toString(), candidateFile.getName());
              break;
            }
          }
          if (!matchFound)
            throw new IOException("element '" + subPath + "' not found in '" + matchingPath + "'");
        }
        caseSensitivePath = matchingPath;
        return caseSensitivePath;
      }
    }
    

    Of course, this is a little ugly and will just get you the first matching path if there are multiple case-insensitive matches in an archive. The algorithm will stop searching after the first match in each subdirectory. I am not particularly proud of this solution, but it was a nice exercise and you seem to insist that you want to do it this way. I just hope you are never confronted with a UNIX-style ZIP archive created on a case-sensitive file system and containing multiple possible matches.

    BTW, the console log for my sample file looks like this:

    Original path: ..\..\..\downloads\powershellarsenal-master.zip\PowerShellArsenal-master\LIB\CAPSTONE\LIB\X64\LIBCAPSTONE.DLL
    Case-sensitive path: C:\Users\Alexander\Downloads\PowerShellArsenal-master.zip\PowerShellArsenal-master\Lib\Capstone\lib\x64\libcapstone.dll
    File size: 3.629.294 bytes