Search code examples
phputf-8character-encodingbyte-order-markmojibake

How do I remove  from the beginning of a file?


I have a CSS file that looks fine when I open it using gedit, but when it's read by PHP (to merge all the CSS files into one), this CSS has the following characters prepended to it: 

PHP removes all whitespace, so a random  in the middle of the code messes up the entire thing. As I mentioned, I can't actually see these characters when I open the file in gedit, so I can't remove them very easily.

I googled the problem, and there is clearly something wrong with the file encoding, which makes sense being as I've been shifting the files around to different Linux/Windows servers via ftp and rsync, with a range of text editors. I don't really know much about character encoding though, so help would be appreciated.

If it helps, the file is being saved in UTF-8 format, and gedit won't let me save it in ISO-8859-15 format (the document contains one or more characters that cannot be encoded using the specified character encoding). I tried saving it with Windows and Linux line endings, but neither helped.


Solution

  • Three words for you:

    Byte Order Mark (BOM)

    That's the representation for the UTF-8 BOM in ISO-8859-1. You have to tell your editor to not use BOMs or use a different editor to strip them out.

    To automatize the BOM's removal you can use awk as shown in this question.

    As another answer says, the best would be for PHP to actually interpret the BOM correctly, for that you can use mb_internal_encoding(), like this:

     <?php
       //Storing the previous encoding in case you have some other piece 
       //of code sensitive to encoding and counting on the default value.      
       $previous_encoding = mb_internal_encoding();
    
       //Set the encoding to UTF-8, so when reading files it ignores the BOM       
       mb_internal_encoding('UTF-8');
    
       //Process the CSS files...
    
       //Finally, return to the previous encoding
       mb_internal_encoding($previous_encoding);
    
       //Rest of the code...
      ?>