Search code examples
netbeansjavafxutf-8mojibake

Mojibake in JavaFX (works fine in Netbeans)


Working on a simple translator project in Netbeans using JavaFX. Running it from Netbeans, it compiles and works perfectly. No rendering issues:

appearance when launched from Netbeans

But when running the same executable ([project-folder]\dist\Translator.jar):

mojibake in \dist\Translator.jar

Mojibake. Same thing for ([project-folder]\dist\run##########\Translator.jar):

mojibake in \dist\run##########\Translator.jar

There are four places the text could be misformatted: a list of terms is sent to the translator, which uses a web service to retrieve the translations (1). These are then cached in files (2), and are loaded by a parser (3), which makes data available for display in the JavaFX window (4). I've inspected the files and they're valid UTF-8, and the parser only runs when it's loading an existing file, which a new deployment wouldn't have any of. So I've narrowed it down to the display in the JavaFX window.


Solution

  • I'm sorry my question wasn't fantastic, but I've come across many people with similar issues and it was hard to isolate the specific case for mine.

    The crux of the issue is that Netbeans will automatically execute all JVM sessions with UTF-8 as the default encoding (as far as I'm aware). Normally this isn't an issue, but when working with languages that take advantage of disputed codepoints within the UTF-8 specification, this will likely guarantee mojibake will be spat out by any JVM that uses an encoding other than UTF-8. This is a majority of them, because the JVM specification says that the best practice is to use the host system's encoding, which is frequently not UTF-8.

    The question Java compiler platform file encoding problem helped me to address the issue. Since I don't have access to the JVM arguments for every system my code will run on (which seems unrealistic), the solution below is what I personally opted for.

    /**
     * Converts a string from the system default encoding into UTF-8.
     * This fixes rendering issues for UTF-8 characters where the default
     * encoding would yield mojibake.  Should be run against any Strings that
     * will be displayed to the end user directly that may contain UTF-8
     * characters.
     * 
     * @param string    The String to be re-encoded.
     * @return          the re-encoded string
     */
    public static String convertToUTF8(String string){
        return new String(string.getBytes(Charset.defaultCharset()), Charset.forName("UTF-8"));
    }
    

    A simple little method that gets the job done, called as necessary.