Search code examples
javascriptbase64compressiondata-uribtoa

Is it possible to make a reversible atob from btoa?


Why can't btoa convert back a base64 string?

let b64 = "abc123"
console.log(btoa(atob(b64))); //abc12w==

What happened in the end of the string?

I store images in localStorage as DataURI that is basically a Base64 string with a header. Both use only the 7-bit ascii. To fill the unused bits (String has 16 bits) and half the length I made these functions, that seems to bug on btoa:

function compress(dataURI_or_base64) { // 8-bit to 16-bit string (get half length)
    var bin = atob(dataURI_or_base64.substr(dataURI_or_base64.indexOf(',')+1)); //remove header
    var len = bin.length;
    var arr = new Uint8Array(new ArrayBuffer(len));
    for (var i = 0; i < len; i++) arr[i] = bin.charCodeAt(i);
    return new TextDecoder('utf-16').decode(arr);
}

function decompress(str, mimeType) { // 16-bit to 8-bit string (get back)
    var arr = new Uint8Array(str.length * 2), i = 0, j = 0;
    while (i < str.length) {
        var cc = str.charCodeAt(i++)
        arr[j++] = cc & 0xff;
        arr[j++] = cc >> 8;
    }
    var bin = '', i = 0;
    while (i < arr.byteLength) bin += String.fromCharCode(arr[i++]);
    let dataURI = 'data:' + mimeType + ';base64,'
    return (mimeType? dataURI: "") + window.btoa(bin);
}

What is odd with btoa(atob(b64)) and can we get it to work (make it reversible)?

atob_rev(atob(b64)) === b64 //true. How to make atob_rev() to replace btoa?

No helpful answers found (2023.07.30):

why-are-atob-and-btoa-not-reversible

javascript-code-and-decode-from-base64-using-atob-and-btoa-functions

Note that atob must be used first, because btoa gives a longer string, but atob becomes shorter and can further be halved by make it into 16-bit strings, so any answer with btoa first is wrong!

Update (2023.08.01)

I have accepted the answer! My example was not a legal Base64 string.

I also had to remove TextDecoder to get it work. Now the code is:

function compress(dataURI_or_base64) {
    let i = 0, a = dataURI_or_base64; //6-bit ascii
    a = a.slice(a.indexOf(',') + 1) //remove header
    if (a.length % 4) alert("Padding error\n" + a)

    let bin = window.atob(a), str16bit = ""
    while (i < bin.length) {
      let lo = bin.charCodeAt(i++);
      let hi = bin.charCodeAt(i++);
      str16bit += String.fromCharCode(lo | hi << 8)
    }
    return str16bit
}

The lossless compression is impressive - over 50%!

Also it check the padding; a.length % 4 === 0. So it now gives an error on the example "abc123" as the length of input to atob must be divisible by 4.

An alternative could be to have automatic internal padding and strip the paddings of in decompress(). It may be useful on code and decode 6-bit ascii text messages.


Solution

  • Because "abc123" is not a valid Base64 encoding of anything. You made that up.

    No, it's not possible. btoa(atob(x)) will return x if and only if x is a valid Base64 encoding. That means that any padding bits are zeros, and that such padding is indicated by one or two = signs, each one representing two padding bits. (Some encodings permit omitting the equal signs, since the padding can be inferred.)

    Your "abc123" is six Base64 characters representing 36 bits of data. Since a byte is eight bits, you only get 32 bits out of that. For most Base64 encodings, you need to follow that with two equal signs to indicate that the extra four bits are discarded. Those extra four bits are supposed to be zeros, but they are not (they are 0111 in this case), so this is not a valid Base64 encoding, regardless of whether there are or are not equal signs.

    Most Base64 encoders are liberal, and will decode the input regardless of the non-zero pad bits. As a result, when encoding four bytes, there are 16 possible ways to encode it, only one of which is valid Base64, which will nevertheless give the same output. Or 48 possible ways if you include appending zero or one equal signs, which liberal decoders are likely to accept as well.

    E.g. "abc12z", "abc127=", and "abc12/==" will all give the same four bytes, 69 b7 35 db. (Note: Javascript actually does complain about the second one of those. It will only accept zero equal signs, or the correct number of equal signs.)

    Therefore atob() is not a one-to-one mapping of arbitrary strings of Base64 characters to byte strings. Since it is not a one-to-one mapping, it is not reversible.

    btoa() is a one-to-one mapping. atob(btoa(y)) will always give y.