Search code examples
pythonparsinghexoffset

Python Read Certain Number of Bytes After Character


I'm dealing with a character separated hex file, where each field has a particular start code. I've opened the file as 'rb', but I was wondering, after I get the index of the startcode using .find, how do I read a certain number of bytes from this position? This is how I am loading the file and what I am attempting to do

with open(someFile, 'rb') as fileData:
    startIndex = fileData.find('(G')
    data = fileData[startIndex:7]

where 7 is the number of bytes I want to read from the index returned by the find function. I am using python 2.7.3


Solution

  • You can get the position of a substring in a bytestring under python2.7 like this:

    >>> with open('student.txt', 'rb') as f:
    ...     data = f.read()
    ... 
    >>> data  # holds the French word for student: élève
    '\xc3\xa9l\xc3\xa8ve\n'
    >>> len(data)  # this shows we are dealing with bytes here, because "élève\n" would be 6 characters long, had it been properly decoded!
    8
    >>> len(data.decode('utf-8'))
    6
    >>> data.find('\xa8')  # continue with the bytestring...
    4
    >>> bytes_to_read = 3
    >>> data[4:4+bytes_to_read]  
    '\xa8ve'
    

    You can look for the special characters, and for compatibility with Python3k, it's better if you prepend the character with a b, indicating these are bytes (in Python2.x, it will work without though):

     >>> data.find(b'è')  # in python2.x this works too (unfortunately, because it has lead to a lot of confusion): data.find('è')
    3
    >>> bytes_to_read = 3
    >>> pos = data.find(b'è')
    >>> data[pos:pos+bytes_to_read] # when you use the syntax 'n:m', it will read bytes in a bytestring
    '\xc3\xa8v'
    >>>