Search code examples
utf-8character-encodingbinaryfiles

Why do some binary files have null bytes between characters when storing strings?


Back when I used to mess around with .CON files (the native file format that games would use on the Xbox 360) I remember seeing the text in those files separated by null bytes when viewing them with a hex editor.

Today I've also noticed that while looking at a localstorage file from Chrome with an SQLite browser; all the text fields stored as binary/blob values like this:

22007700730073003a002f002f006700
6100740065007700610079002e006400
6900730063006f00720064002e006700
67002200

What's with the null bytes? Is this a different type of character encoding? I figure it can't be utf8 since utf8 wouldn't use two bytes to encode characters in the ascii range, but maybe I'm wrong?


Solution

  • That's UTF-16 for "wss://gateway.discord.gg". (If you see alternating ASCII bytes and null bytes, you can bet that it's UTF-16.)

    Since a JavaScript string is UTF-16, and SQLite supports storing text in UTF-16, it's not surprising that Chrome would use this for its implementation of localstorage.