Search code examples
pythonencodingbase64decoding

Base64 decoding and encoding give different results


I have the two following encoded string :

base64_str1 = 'eyJzZWN0aW9uX29mZnNldCI6MiwiaXRlbXNfb2Zmc2V0IjozNiwidmVyc2lvbiI6MX0%3D'
base64_str2 = 'eyJzZWN0aW9uX29mZnNldCI6MCwiaXRlbXNfb2Zmc2V0IjowLCJ2ZXJzaW9uIjoxfQ%3D%3D'

Using Base64 online decoder/encoder , the results are as follow (which are the right results) :

base64_str1_decoded = '{"section_offset":2,"items_offset":36,"version":1}7'
base64_str2_decoded = '{"section_offset":0,"items_offset":0,"version":1}'

However, when I tried to encode base64_str1_decoded or base64_str2_decoded back to Base64, I'm not able to obtain the initial base64 strings.

For instance, the ouput for the following code :

base64_str2_decoded = '{"section_offset":0,"items_offset":0,"version":1}'
recoded_str2 = base64.b64encode(bytes(base64_str2_decoded, 'utf-8'))
print(recoded_str2)

# output = b'eyJzZWN0aW9uX29mZnNldCI6MCwiaXRlbXNfb2Zmc2V0IjowLCJ2ZXJzaW9uIjoxfQ=='
# expected_output = eyJzZWN0aW9uX29mZnNldCI6MCwiaXRlbXNfb2Zmc2V0IjowLCJ2ZXJzaW9uIjoxfQ%3D%3D

I tried changing the encoding scheme but can't seem to make it work.


Solution

  • Notice that extra 7 at the end of base64_str1_decoded? That's because your input strings are incorrect. They have escape codes required for URLs. %3D is an escape code for =, which is what should be entered into the online decoder instead. You'll notice the 2nd string in the decoder has an extra ÃÜ on the next line you haven't shown due to using %3D%3D instead of ==. That online decoder is allowing invalid base64 to be decoded.

    To correctly decode in Python use urllib.parse.unquote on the string to remove the escaping first:

    import base64
    import urllib.parse
    
    base64_str1 = 'eyJzZWN0aW9uX29mZnNldCI6MiwiaXRlbXNfb2Zmc2V0IjozNiwidmVyc2lvbiI6MX0%3D'
    base64_str2 = 'eyJzZWN0aW9uX29mZnNldCI6MCwiaXRlbXNfb2Zmc2V0IjowLCJ2ZXJzaW9uIjoxfQ%3D%3D'
    
    # Demonstrate Python decoder detects invalid B64 encoding
    try:
        print(base64.b64decode(base64_str1))
    except Exception as e:
        print('Exception:', e)
    try:
        print(base64.b64decode(base64_str2))
    except Exception as e:
        print('Exception:', e)
    
    # Decode after unquoting...
    base64_str1_decoded = base64.b64decode(urllib.parse.unquote(base64_str1))
    base64_str2_decoded = base64.b64decode(urllib.parse.unquote(base64_str2))
    print(base64_str1_decoded)
    print(base64_str2_decoded)
    
    # See valid B64 encoding.
    recoded_str1 = base64.b64encode(base64_str1_decoded)
    recoded_str2 = base64.b64encode(base64_str2_decoded)
    print(recoded_str1)
    print(recoded_str2)
    

    Output:

    Exception: Invalid base64-encoded string: number of data characters (69) cannot be 1 more than a multiple of 4
    Exception: Incorrect padding
    b'{"section_offset":2,"items_offset":36,"version":1}'
    b'{"section_offset":0,"items_offset":0,"version":1}'
    b'eyJzZWN0aW9uX29mZnNldCI6MiwiaXRlbXNfb2Zmc2V0IjozNiwidmVyc2lvbiI6MX0='
    b'eyJzZWN0aW9uX29mZnNldCI6MCwiaXRlbXNfb2Zmc2V0IjowLCJ2ZXJzaW9uIjoxfQ=='
    

    Note that the b'' notation is Python's indication that the object is a byte string as opposed to a Unicode string and is not part of the string itself.