Search code examples
javajarcopynio

Use FileChannel or Files.copy to copy file inside jar


Use Java 8.

To get the best performance, I tried to copy file with Files.copy(), but soon found it does not support Chinese characters. For instance:

try {
    Files.copy(
        Objects.requireNonNull(
            Main.class.getResourceAsStream("/amres/core/template.xlsx")),
        Paths.get("C:/我的/test.xlsx"), // "我的" means mine in Chinese
        StandardCopyOption.REPLACE_EXISTING
    );
} catch (IOException e) {
    e.printStackTrace();
}

The code intends to copy a file from jar, but it throws a exception (a "我的" folder has been created ahead of schedule):java.nio.file.NoSuchFileException: C:\鎴戠殑\test.xlsx
The problem is, "鎴戠殑" even can not be understood by a Chinese, so I'm looking for a solution to handle Chinese characters.

I also tried FileChannel, but failed, realizing it is used for direct file, not for files in a jar. How should I do?


Solution

  • You're barking up the wrong tree. Files.copy has nothing whatsoever to do with support (or lack thereof) of chinese characters, and java does support full unicode pathnames. Yes, it's clear your code isn't currently working as designed, and it can be fixed, but the problem isn't Files.copy.

    Sidenote: Your code is broken

    Main.class.getResourceAsStream is the correct way to pull resources from your codebase, however, this is a resource, and therefore, you must close it. Wrap it in a try block, that's the smart way to do it.

    Objects.requireNonNull should not be used here - its purpose is to forcibly throw a NullPointerException. That's all it does. This code will already throw an NPE if somehow that resource is missing. This means the requireNonNull is completely pointless (it is enforcing a thing that already happens), and if you want clean code, either is inappropriate: You should be rethrowing with an exception that properly conveys the notion that the app is broken.

    However, that isn't a good idea either: We don't throw exceptions for bugs or broken deployments. If you think you should, then you should wrap every line of code in your entire java project with a try/catch block, after all, evidently we can't assume anything. We can't even assume java.lang.String is available at run time - clearly this is not a sustainable point of view. In other words, you may safely assume that the resource couldn't possibly not be there for the purposes of exception flow.

    Thus, we get to this much simpler and safer code:

    try (var in = Main.class.getResourceAsStream("/amres/core/template.xlsx")) {
      Files.copy(in, Paths.get("C:/我的/test.xlsx"), StandardCopyOption.REPLACE_EXISTING);
    }
    

    Note that in general, catching an exception and handling it with e.printStackTrace() is also just bad, in all cases: You're printing the exception to a place it probably shouldn't go, tossing useful info such as the causal chain, and then letting code exception continue even though your code's state is clearly in an unexpected and therefore unknown state. The best general solution is to actually throws the exception onwards. If that is not feasible or you just don't want to care about it right now and thus relying on your IDE's auto-fixes and not bothering to edit anything, then at least fix your IDE's auto-fixer to emit non-idiotic code. throw new RuntimeException("uncaught", e) is the proper 'I do not want to care about this right now' fillin. So, fix your IDE. It's generally in the settings, under 'template'.

    What could be causing this

    Every single time chars turn to bytes or vice versa, charset encoding is involved. Filenames look like characters and certainly when you write code, you're writing characters, and when you see the text of the NoSuchFileException, that's characters - but what about all the stuff in between? In addition, names in file systems themselves are unclear: Some filesystem names are byte-based. For example, Apple's APFS is entirely bytebased. File names just are bytes, and the idea of rendering these bytes onto the screen (and translating e.g. touch foobar.txt on the command line onto the byte sequence value for the file name) are done with UTF-8 merely by convention. In contrast some file systems encode this notion of a set encoding directly into its APIs. Best bet is to UTF_8 all the things, that's the least chance of things going awry.

    So, let's go through the steps:

    1. You write java code in a text editor. You write characters.
    2. File content is, universally, byte based, on all file systems. Therefore, when you hit the 'save' shortcut in your text editor, your characters are converted into bytes. Check that your editor is configured in UTF-8 mode. Alternatively, use backslash-u escapes to avoid the issue.
    3. You compile this code. Probably with javac, ecj, or something based on one of those two. They read in a file (so, bytes), but parse the input as characters, therefore conversion is happening. Ensure that javac/ecj is invoked with the --encoding UTF-8 parameter. If using a build tool such as maven or gradle, ensure this is explicitly configured.
    4. That code is run and prints its error to a console. A console shows it to you. The console is converting bytes (because app output is a byte-based stream) to chars in order to show it to you. Is it configured to do this using UTF-8? Check the terminal app's settings.

    Check all the bolded items, 95%+ chance you'll fix your problem by doing this.