Search code examples
javascriptpythonbase64base64url

How to get same results from base64 atob in Javascript vs Python


I found some code online that I am trying to work through which encodes to base64. I know Python has base64.urlsafe_b64decode() but I would like to learn a bit more about what is going on.

The JS atob looks like:

function atob (input) {
  var chars = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=';
  var str = String(input).replace(/=+$/, '');
  if (str.length % 4 == 1) {
    throw new InvalidCharacterError("'atob' failed: The string to be decoded is not correctly encoded.");
  }
  for (
    // initialize result and counters
    var bc = 0, bs, buffer, idx = 0, output = '';
    // get next character
    buffer = str.charAt(idx++);
    // character found in table? initialize bit storage and add its ascii value;
    ~buffer && (bs = bc % 4 ? bs * 64 + buffer : buffer,
      // and if not first of each 4 characters,
      // convert the first 8 bits to one ascii character
      bc++ % 4) ? output += String.fromCharCode(255 & bs >> (-2 * bc & 6)) : 0
  ) {
    // try to find character in table (0-63, not found => -1)
    buffer = chars.indexOf(buffer);
  }
  return output;
}

My goal is to port this Python, but I am trying to understand what the for loop is doing in Javascript.

It checks if the value is located in the chars table and then initializes some variables using a ternary like: bs = bc % 4 ? bs*64+buffer: buffer, bc++ %4

I am not quite sure I understand what the buffer, bc++ % 4 part of the ternary is doing. The comma confuses me a bit. Plus the String.fromCharCode(255 & (bs >> (-2 * bc & 6))) is a bit esoteric to me.

I've been trying something like this in Python, which produces some results, albeit different than what the javascript implementation is doing

# Test subject
b64_str: str = "fwHzODWqgMH+NjBq02yeyQ=="
    
# Lookup table for characters
chars: str = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/="

# Replace right padding with empty string
replaced = re.sub("=+$", '', b64_str)
if len(replaced) % 4 == 1:
    raise ValueError("atob failed. The string to be decoded is not valid base64")

# Bit storage and counters
bc = 0
out: str = ''
for i in replaced:

    # Get ascii value of character
    buffer = ord(i)

    # If counter is evenly divisible by 4, return buffer as is, else add the ascii value
    bs = bc * 64 + buffer if bc % 4 else buffer
    bc += 1 % 4 # Not sure I understand this part
    
    # Check if character is in the chars table
    if i in chars:

        # Check if the bit storage and bit counter are non-zero
        if bs and bc:
            # If so, convert the first 8 bits to an ascii character
            out += chr(255 & bs >> (-2 * bc & 6))
        else:
            out = 0
            
    # Set buffer to the index of where the first instance of the character is in the b64 string
    print(f"before: {chr(buffer)}")
    buffer = chars.index(chr(buffer))
    print(f"after: {buffer}")
    
print(out)

JS gives ó85ªÁþ60jÓlÉ

Python gives 2:u1(²ë:ð1G>%Y


Solution

    • The loop processes each character in chunks of four, converting each Base64 character back into its binary form.
    • bc helps keep track of where we are in these 24-bit groups.
    • bs accumulates the bits from the Base64 characters, and output builds the decoded string by converting 8-bit chunks of bs to characters.
    • The ternary operation and bitwise shifts are used to manipulate and extract the correct bits from the Base64 data.

    Here is a tested version https://www.online-python.com/PiseKNFuaO

    import base64
    
    class InvalidCharacterError(Exception):
        pass
    
    def atob(input_str):
        chars = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/='
        input_str = str(input_str).rstrip('=')
        
        if len(input_str) % 4 == 1:
            raise InvalidCharacterError("'atob' failed: The string to be decoded is not correctly encoded.")
        
        output = []
        bc = 0
        bs = 0
        buffer = 0
        
        for char in input_str:
            buffer = chars.find(char)
            
            if buffer == -1:
                raise InvalidCharacterError("'atob' failed: The string to be decoded contains an invalid character.")
            
            bs = (bs << 6) + buffer
            bc += 6
            
            if bc >= 8:
                bc -= 8
                output.append(chr((bs >> bc) & 255))
        
        return ''.join(output)
    
    # Compare with Python's built-in Base64 decoding
    def test_atob():
        test_strings = [
            "SGVsbG8gd29ybGQ=",  # "Hello world"
            "U29mdHdhcmUgRW5naW5lZXJpbmc=", # "Software Engineering"
            "VGVzdGluZyAxMjM=", # "Testing 123"
            "SGVsbG8gd29ybGQ==",  # "Hello world" with extra padding
            "SGVsbG8gd29ybGQ= ",  # "Hello world" with trailing space (invalid)
            "SGVsbG8gd29ybGQ\r\n",  # "Hello world" with newline characters (invalid)
            "Invalid!!==",  # Invalid characters
            "VGhpcyBpcyBhbiBlbmNvZGVkIHN0cmluZyE", # "This is an encoded string!" without padding
            "U29tZVNwZWNpYWwgQ2hhcnM6ICsgLyA=", # "SomeSpecial Chars: + / " with padding
        ]
        
        for encoded in test_strings:
            try:
                expected = base64.b64decode(encoded).decode('utf-8')
                result = atob(encoded)
                print(result == expected, "Custom:", result, "Expected:", expected)
            except Exception as e:
                print(f"Error for string: {encoded} - {e}")
    
    test_atob()