Search code examples
pythonpython-3.xcharacter-encoding

How can I convert this language to actual numbers and text?


I am working on natural language processing project with deep learning and I downloaded a word embedding file. The file is in .bin format. I can open that file with

file = open("cbow.bin", "rb")

But when I type

file.read(100)

I get

b'4347907 300\n</s> H\xe1\xae:0\x16\xc1:\xbfX\xa7\xbaR8\x8f\xba\xa0\xd3\xee9K\xfe\x83::m\xa49\xbc\xbb\x938\xa4p\x9d\xbat\xdaA:UU\xbe\xba\x93_\xda9\x82N\x83\xb9\xaeG\xa7\xb9\xde\xdd\x90\xbaww$\xba\xfdba:\x14.\x84:R\xb8\x81:0\x96\x0b:\x96\xfc\x06'  

What is this language and How can I convert it into actual numbers and text using python?


Solution

  • This weird language you are referring to is a python bytestring.

    As @jolitti implied that you won't be able to convert this particular bytestring to readable text.

    If the bytestring contained any characters you recognize then would have been displayed like this.

    b'Guido van Rossum'