Search code examples
fileencodingformatutf

What is the format or encoding of a file with data like this?


I have a text file which I tried opening with Sublime Text on a Mac. When I just open the file, I see data like this...

efbf bdef bfbd 5300 4b00 5500 0900 4900
4d00 4100 4700 4500 5500 5200 4c00 0900

If I try to reopen with UTF LE 16 encoding, I see..

 뿯붿SKU 

Could you help me determine the format and/or encoding of this file?

If I open the file with Excel or Mac's TextEdit, I also see the same as UTF LE 16 encoding on Sublime.

So if the file is UTF LE 16 encoded, what are those special characters?


Solution

  • EF BF BD is the REPLACEMENT CHARACTER � encoded in UTF-8. It likely means that this data was in some format other than UTF-8 (say ISO-8859-1), but was parsed at some point by a UTF-8 system that replaced the illegal bytes with REPLACEMENT CHARACTER.

    Without more background on how you came to have this file, it's hard to speculate on the precise cause. It's even possible that it's Sublime Text that's doing this replacement and the file itself is in some other encoding (and hasn't been modified).

    I would make sure that these are really the bytes in the file, and it isn't just Sublime Text displaying it in a funny way. Use a simpler tool like xxd to dump the contents as hex bytes and make sure this is really what's in there.