Search code examples
pythonstringparsingeeprom

Howto Remove Garbage Data from String


I'm in a situation where I have to use Python to read and write to an EEPROM on an embedded device. The first page (256 bytes) is used for non-volatile data storage. My problem is that the variables can vary in length, and I need to read a fixed amount.

For example, an string is stored at address 30 and can be anywhere from 6 to 10 bytes in length. I need to read the maximum possible length, because I don't know where it ends. What that does is it gives me excess garbage in the string.

data_str = ee_read(bytecount)
dbgmsg("Reading from EEPROM: addr = " + str(addr_low) + " value = " + str(data_str))

> Reading from EEPROM: addr = 30 value = h11c13����

I am fairly new to Python. Is there a way to automatically chop off that data in the string after it's been read in?


Solution

  • Do you mean something like:

    >>> s = 'Reading from EEPROM: addr = 30 value = h11c13����'
    >>> s
    'Reading from EEPROM: addr = 30 value = h11c13\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd'
    >>> filter(lambda x: ord(x)<128,s)
    'Reading from EEPROM: addr = 30 value = h11c13'
    

    On python3, you'll need to to join the string:

    ''.join(filter(lambda x: ord(x)<128,s)
    

    A version which works for python2 and python3 would be:

    ''.join(x for x in s if ord(x) < 128)
    

    Finally, it is concieveable that the excess garbage could contain printing characters. In that case you might want to take only characters until you read a non-printing character, itertools.takewhile could be helpful...

    import string #doesn't exist on python3.x, use the builtin `str` type instead.
    from itertools import takewhile
    
    printable = set(string.printable)  
    ''.join(takewhile(lambda x: x in printable, s))