Search code examples
unicodecharacter

How can I clean source code files of invisible characters?


I have a bizarre problem: Somewhere in my HTML/PHP code there's a hidden, invisible character that I can't seem to get rid of. By copying it from Firebug and converting it I identified it as  or 'Zero width no-break space'. It shows up as non-empty text node in my website and is causing a serious layout problem.

The problem is, I can't get rid of it. I can't see it in my files even when turning Invisibles on (duh). I can't seem to find it, no search tool seems to pick up on it. I rewrote my code around where it could be, but it seems to be somewhere deeper in one of the framework files.

How can I find characters by charcode across files or something like that? I'm open to different tools, but they have to work on Mac OS X.


Solution

  • You don't get the character in the editor, because you can't find it in text editors. #FEFF or #FFFE are so-called byte-order marks. They are a Microsoft invention to tell in a Unicode file, in which order multi-byte characters are stored.

    To get rid of it, tell your editor to save the file either as ANSI/ISO-8859 or as Unicode without BOM. If your editor can't do so, you'll either have to switch editors (sadly) or use some kind of truncation tool like, e.g., a hex editor that allows you to see how the file really looks.

    On googling, it seems, that TextWrangler has a "UTF-8, no BOM" mode. Otherwise, if you're comfortable with the terminal, you can use Vim:

    :set nobomb
    

    and save the file. Presto!

    The characters are always the very first in a text file. Editors with support for the BOM will not, as I mentioned, show it to you at all.