I'm opening an image in binary mode with python3 then splitting that data at a specific marker (\xff\xda)
everything that is after that marker is stored in a variable for which I'd like to replace all a's by e's
but I'm having troubles when converting the binary data to string :
UnicodeDecodeError : 'ascii' codec can't decode byte 0xe6 in position 13: ordinal not in range(128)
with open(filein, "rb") as rd:
with open(fileout,'wb') as wr:
img = rd.read()
if img.find(b'\xff\xda'): ## ff da start of scan
splitimg = img.split(b'\xff\xda', 1)
wr.write(splitimg[0])
scanimg = splitimg[1]
scanglitch = ""
scanimg = scanimg.encode()
for letter in scanimg :
if letter not in 'a':
scanglitch += letter
else :
scanglitch += 'e'
print(scanimg)
wr.write(b'\xff\xda')
content = scanglitch.decode()
wr.write(content)
Isn't encode() and decode() the right way to convert binary data to strings and back ? thx
When dealing with binary data, you'll want to try and remain in binary mode as much as possible, especially since there's no guarantee your chosen string encoding can represent all values anyway.
Just remember bytes
objects are fundamentally lists of 8-bit unsigned integers, even if they have the convenient string-like b'xyz'
syntax.
filein = "download.jpeg"
fileout = "glitch.jpg"
with open(filein, "rb") as rd:
img = rd.read()
# We can happily crash here if there's no FFDA;
# that means we're not able to process the file anyway
prelude, marker, scanimg = img.partition(b"\xff\xda")
scanglitch = []
for letter in scanimg: # scanimg is a list of integers, so we have to use `ord()`
if letter != ord("a"):
scanglitch.append(letter)
else:
scanglitch.append(ord("e"))
with open(fileout, "wb") as wr:
wr.write(prelude)
wr.write(marker)
wr.write(bytes(scanglitch))
(I am aware the replacement logic could be written as a list comprehension, but I figured it'd be more friendly like this.)