I am working on natural language processing project with deep learning and I downloaded a word embedding file. The file is in .bin
format. I can open that file with
file = open("cbow.bin", "rb")
But when I type
file.read(100)
I get
b'4347907 300\n</s> H\xe1\xae:0\x16\xc1:\xbfX\xa7\xbaR8\x8f\xba\xa0\xd3\xee9K\xfe\x83::m\xa49\xbc\xbb\x938\xa4p\x9d\xbat\xdaA:UU\xbe\xba\x93_\xda9\x82N\x83\xb9\xaeG\xa7\xb9\xde\xdd\x90\xbaww$\xba\xfdba:\x14.\x84:R\xb8\x81:0\x96\x0b:\x96\xfc\x06'
What is this language and How can I convert it into actual numbers and text using python?
This weird language you are referring to is a python bytestring.
As @jolitti implied that you won't be able to convert this particular bytestring to readable text.
If the bytestring contained any characters you recognize then would have been displayed like this.
b'Guido van Rossum'