I have the two following encoded string :
base64_str1 = 'eyJzZWN0aW9uX29mZnNldCI6MiwiaXRlbXNfb2Zmc2V0IjozNiwidmVyc2lvbiI6MX0%3D'
base64_str2 = 'eyJzZWN0aW9uX29mZnNldCI6MCwiaXRlbXNfb2Zmc2V0IjowLCJ2ZXJzaW9uIjoxfQ%3D%3D'
Using Base64 online decoder/encoder , the results are as follow (which are the right results) :
base64_str1_decoded = '{"section_offset":2,"items_offset":36,"version":1}7'
base64_str2_decoded = '{"section_offset":0,"items_offset":0,"version":1}'
However, when I tried to encode base64_str1_decoded
or base64_str2_decoded
back to Base64, I'm not able to obtain the initial base64 strings.
For instance, the ouput for the following code :
base64_str2_decoded = '{"section_offset":0,"items_offset":0,"version":1}'
recoded_str2 = base64.b64encode(bytes(base64_str2_decoded, 'utf-8'))
print(recoded_str2)
# output = b'eyJzZWN0aW9uX29mZnNldCI6MCwiaXRlbXNfb2Zmc2V0IjowLCJ2ZXJzaW9uIjoxfQ=='
# expected_output = eyJzZWN0aW9uX29mZnNldCI6MCwiaXRlbXNfb2Zmc2V0IjowLCJ2ZXJzaW9uIjoxfQ%3D%3D
I tried changing the encoding scheme but can't seem to make it work.
Notice that extra 7
at the end of base64_str1_decoded
? That's because your input strings are incorrect. They have escape codes required for URLs. %3D
is an escape code for =
, which is what should be entered into the online decoder instead. You'll notice the 2nd string in the decoder has an extra ÃÜ
on the next line you haven't shown due to using %3D%3D
instead of ==
. That online decoder is allowing invalid base64 to be decoded.
To correctly decode in Python use urllib.parse.unquote
on the string to remove the escaping first:
import base64
import urllib.parse
base64_str1 = 'eyJzZWN0aW9uX29mZnNldCI6MiwiaXRlbXNfb2Zmc2V0IjozNiwidmVyc2lvbiI6MX0%3D'
base64_str2 = 'eyJzZWN0aW9uX29mZnNldCI6MCwiaXRlbXNfb2Zmc2V0IjowLCJ2ZXJzaW9uIjoxfQ%3D%3D'
# Demonstrate Python decoder detects invalid B64 encoding
try:
print(base64.b64decode(base64_str1))
except Exception as e:
print('Exception:', e)
try:
print(base64.b64decode(base64_str2))
except Exception as e:
print('Exception:', e)
# Decode after unquoting...
base64_str1_decoded = base64.b64decode(urllib.parse.unquote(base64_str1))
base64_str2_decoded = base64.b64decode(urllib.parse.unquote(base64_str2))
print(base64_str1_decoded)
print(base64_str2_decoded)
# See valid B64 encoding.
recoded_str1 = base64.b64encode(base64_str1_decoded)
recoded_str2 = base64.b64encode(base64_str2_decoded)
print(recoded_str1)
print(recoded_str2)
Output:
Exception: Invalid base64-encoded string: number of data characters (69) cannot be 1 more than a multiple of 4
Exception: Incorrect padding
b'{"section_offset":2,"items_offset":36,"version":1}'
b'{"section_offset":0,"items_offset":0,"version":1}'
b'eyJzZWN0aW9uX29mZnNldCI6MiwiaXRlbXNfb2Zmc2V0IjozNiwidmVyc2lvbiI6MX0='
b'eyJzZWN0aW9uX29mZnNldCI6MCwiaXRlbXNfb2Zmc2V0IjowLCJ2ZXJzaW9uIjoxfQ=='
Note that the b''
notation is Python's indication that the object is a byte string as opposed to a Unicode string and is not part of the string itself.