Search code examples
pythonpython-3.5string-parsingpython-unicode

How can I parse a bytestring in Python 3?


Basically, I have two bytestrings in a single line like this:

b'\xe0\xa6\xb8\xe0\xa6\x96 - \xe0\xa6\xb6\xe0\xa6\x96\n'

This is a Unicode string that I'm importing from an online file using urllib, and I want to compare the individual bytestrings so that I can replace the wrong ones. However, I can't find out any way to parse the string so that I get \xe0\xa6\xb8\xe0\xa6\x96 and \xe0\xa6\xb6\xe0\xa6\x96 in two different variables.

I tried converting it into a raw string like str(b'\xe0\xa6\xb8\xe0\xa6\x96') and the indexing actually works, but in that case I can't revert back to the original bytestring in the first place.

Is it possible?


Solution

  • I would recommend trying something like this...

    arr = b'\xe0\xa6\xb8\xe0\xa6\x96 - \xe0\xa6\xb6\xe0\xa6\x96\n'
    
    splt = arr.decode().split(' - ')
    
    b_arr1 = splt[0].encode()
    b_arr2 = splt[1].encode()
    

    I tried it out in the Python 3 terminal and it works fine.