I am currently experimenting with how Python 3 handles bytes when reading, and writing data and I have come across a particularly troubling problem that I can't seem to find the source of. I am basically reading bytes out of a JPEG file, converting them to an integer using ord()
, then returning the bytes to their original character using the line chr(character).encode('utf-8')
and writing it back into a JPEG file. No issue right? Well when I go to try opening the JPEG file, I get a Windows 8.1 notification saying it can not open the photo. When I check the two files against each other one is 5.04MB, and the other is 7.63MB which has me awfully confused.
def __main__():
operating_file = open('photo.jpg', 'rb')
while True:
data_chunk = operating_file.read(64*1024)
if len(data_chunk) == 0:
print('COMPLETE')
break
else:
new_operation = open('newFile.txt', 'ab')
for character in list(data_chunk):
new_operation.write(chr(character).encode('utf-8'))
if __name__ == '__main__':
__main__()
This is the exact code I am using, any ideas on what is happening and how I can fix it?
NOTE: I am assuming that the list of numbers that list(data_chunk)
provides is the equivalent to ord()
.
Here is a simple example you might wish to play with:
import sys
f = open('gash.txt', 'rb')
stuff=f.read() # stuff refers to a bytes object
f.close()
print(stuff)
f2 = open('gash2.txt', 'wb')
for i in stuff:
f2.write(i.to_bytes(1, sys.byteorder))
f2.close()
As you can see, the bytes object is iterable, but in the for
loop we get back an int
in i
. To convert that to a byte I use int.to_bytes()
method.