Search code examples
javascriptjavabase64decodescriptengine

Decoding Base64 String in Java


I'm using Java and I have a Base64 encoded string that I wish to decode and then do some operations to transform.

The correct decoded value is obtained in JavaScript through function atob(), but in java, using Base64.decodeBase64() I cannot get an equal value.

Example:

For:

String str = "AAAAAAAAAAAAAAAAAAAAAMaR+ySCU0Yzq+AV9pNCCOI="

With JavaScript atob(str) I get ->

"Æ‘û$‚SF3«àö“Bâ"

With Java new String(Base64.decodeBase64(str)) I get ->

"Æ?û$?SF3«à§ö?â"


Another way I could fixed the issue is to run JavaScript in Java with a Nashorn engine, but I'm getting an error near the "$" symbol.

Current Code:

ScriptEngine engine = new ScriptEngineManager().getEngineByName("JavaScript");
String script2 = "function decoMemo(memoStr){ print(atob(memoStr).split('')" + 
    ".map((aChar) => `0${aChar.charCodeAt(0).toString(16)}`" +
    ".slice(-2)).join('').toUpperCase());}";
try {
    engine.eval(script2);
    Invocable inv = (Invocable) engine;
    String returnValue = (String)inv.invokeFunction("decoMemo", memoTest );
    System.out.print("\n result: " + returnValue);
} catch (ScriptException | NoSuchMethodException e1) {
    e1.printStackTrace();

Any help would be appreciated. I search a lot of places but can't find the correct answer.


Solution

  • btoa is broken and shouldn't be used.

    The problem is, bytes aren't characters. Base64 encoding does only one thing. It converts bytes to a stream of characters that survive just about any text-based transport mechanism. And Base64 decoding does that one thing in reverse, it converts such characters into bytes.

    And the confusion is, you're printing those bytes as if they are characters. They are not.

    You end up with the exact same bytes, but javascript and java disagree on how you're supposed to turn that into an ersatz string because you're trying to print it to a console. That's a mistake - bytes aren't characters. Thus, some sort of charset encoding is being used, and you don't want any of this, because these characters clearly aren't intended to be printed like that.

    Javascript sort of half-equates characters and bytes and will freely convert one to the other, picking some random encoding. Oof. Javascript sucks in this regard, it is what it is. The MDN docs on btoa explains why you shouldn't use it. You're running into that problem.

    Not entirely sure how you fix it in javascript - but perhaps you don't need it. Java is decoding the bytes perfectly well, as is javascript, but javascript then turns those bytes into characters into some silly fashion and that's causing the problem.