Search code examples
python-2.7python-3.4binaryfiles

in python2 is OK, but in python3 doesn't work


#!/usr/bin/env python3

f = open('dv.bmp', mode='rb')
slika = f.read()
f.closed

pic   = slika[:28]
slika = slika[54:] 
# dimenzije originalnog bitmapa
pic_w = ord(pic[18]) + ord(pic[19])*256
pic_h = ord(pic[22]) + ord(pic[23])*256
print(pic_w, pic_h)

why this code doesn't work in python3 (in python2 it works fine) OR howto read binary file into string type in python3?


Solution

  • In Python 2.x, binary mode (e.g. 'rb') only affects how Python interprets end-of-line characters:

    On Windows, 'b' appended to the mode opens the file in binary mode, so there are also modes like 'rb', 'wb', and 'r+b'. Python on Windows makes a distinction between text and binary files; the end-of-line characters in text files are automatically altered slightly when data is read or written. This behind-the-scenes modification to file data is fine for ASCII text files, but it’ll corrupt binary data like that in JPEG or EXE files. Be very careful to use binary mode when reading and writing such files. On Unix, it doesn’t hurt to append a 'b' to the mode, so you can use it platform-independently for all binary files.

    However in Python 3.x, binary mode also changes the type of the resulting data:

    Normally, files are opened in text mode, that means, you read and write strings from and to the file, which are encoded in a specific encoding. If encoding is not specified, the default is platform dependent (see open()). 'b' appended to the mode opens the file in binary mode: now the data is read and written in the form of bytes objects. This mode should be used for all files that don’t contain text.

    Since the read results in a bytes object, indexing it results in an integer, not a one-character string as in Python 2. Passing that integer to the ord() function raises the error mentioned in your comment.

    The solution is just to omit the ord() call in Python 3, since the integer you get from indexing the bytes object is the same as what you'd get from calling ord() on the string equivalent.