Search code examples
pythonpython-3.xread-writebytestream

Writing more data to file than reading?


I am currently experimenting with how Python 3 handles bytes when reading, and writing data and I have come across a particularly troubling problem that I can't seem to find the source of. I am basically reading bytes out of a JPEG file, converting them to an integer using ord(), then returning the bytes to their original character using the line chr(character).encode('utf-8') and writing it back into a JPEG file. No issue right? Well when I go to try opening the JPEG file, I get a Windows 8.1 notification saying it can not open the photo. When I check the two files against each other one is 5.04MB, and the other is 7.63MB which has me awfully confused.

def __main__():
    operating_file = open('photo.jpg', 'rb')

    while True:
        data_chunk = operating_file.read(64*1024)
        if len(data_chunk) == 0:
            print('COMPLETE')
            break
        else:
            new_operation = open('newFile.txt', 'ab')
            for character in list(data_chunk):
                new_operation.write(chr(character).encode('utf-8'))


if __name__ == '__main__':
    __main__()

This is the exact code I am using, any ideas on what is happening and how I can fix it?

NOTE: I am assuming that the list of numbers that list(data_chunk) provides is the equivalent to ord().


Solution

  • Here is a simple example you might wish to play with:

    import sys
    
    f = open('gash.txt', 'rb')
    stuff=f.read()    # stuff refers to a bytes object
    f.close()
    
    print(stuff)
    
    f2 = open('gash2.txt', 'wb')
    
    for i in stuff:
        f2.write(i.to_bytes(1, sys.byteorder))
    
    f2.close()
    

    As you can see, the bytes object is iterable, but in the for loop we get back an int in i. To convert that to a byte I use int.to_bytes() method.