I have a .raw file containing a 52 lines html header followed by the data themselves. The file is encoded in little-endian 24bits SIGNED and I want to convert the data to integers in an ASCII file. I use Python 3.
I tried to 'unpack' the entire file with the following code found in this post:
import sys
import chunk
import struct
f1 = open('/Users/anais/Documents/CR_lab/Lab_files/labtest.raw', mode = 'rb')
data = struct.unpack('<i', chunk + ('\0' if chunk[2] < 128 else '\xff'))
But I get this error message:
TypeError: 'module' object is not subscriptable
EDIT
It seems this is better:
data = struct.unpack('<i','\0'+ bytes)[0] >> 8
But I still get an error message:
TypeError: must be str, not type
Easy to fix I presume?
That's not a nice file to process in Python! Python is great for processing text files, because it reads them in big chunks in an internal buffer and then iterates on lines, but you cannot easily access binary data that comes after text read like that. Additionally, the struct
module has no support for 24 bits values.
The only way I can imagine is to read the file one byte at a time, first skip 52 time an end of line, then read bytes 3 at a time, concatenate them in a 4 bytes byte string and unpack it.
Possible code could be:
eol = b'\n' # or whatever is the end of line in your file
nlines = 52 # number of lines to skip
with open('/Users/anais/Documents/CR_lab/Lab_files/labtest.raw', mode = 'rb') as f1:
for i in range(nlines): # process nlines lines
t = b'' # to store the content of each line
while True:
x = f1.read(1) # one byte at a time
if x == eol: # ok we have one full line
break
else:
t += x # else concatenate into current line
print(t) # to control the initial 52 lines
while True:
t = bytes((0,)) # struct only knows how to process 4 bytes int
for i in range(3): # so build one starting with a null byte
t += f1.read(1)
# print(t)
if(len(t) == 1): break # reached end of file
if(len(t) < 4): # reached end of file with uncomplete value
print("Remaining bytes at end of file", t)
break
# the trick is that the integer division by 256 skips the initial 0 byte and keeps the sign
i = struct.unpack('<i', t)[0]//256 # // for Python 3, only / for Python 2
print(i, hex(i)) # or any other more useful processing
Remark: above code assumes that your description of 52 lines (terminated by an end of line) is true, but the shown image let think that last line is not. In that case, you should first count 51 lines and then skip the content of the last line.
def skipline(fd, nlines, eol):
for i in range(nlines): # process nlines lines
t = b'' # to store the content of each line
while True:
x = fd.read(1) # one byte at a time
if x == eol: # ok we have one full line
break
else:
t += x # else concatenate into current line
# print(t) # to control the initial 52 lines
with open('/Users/anais/Documents/CR_lab/Lab_files/labtest.raw', mode = 'rb') as f1:
skiplines(f1, 51, b'\n') # skip 51 lines terminated with a \n
skiplines(f1, 1, b'>') # skip last line assuming it ends at the >
...