Search code examples
phpcsvutf-8byte-order-mark

How can I output a UTF-8 CSV in PHP that Excel will read properly?


I've got this very simple thing that just outputs some stuff in CSV format, but it's got to be UTF-8. I open this file in TextEdit or TextMate or Dreamweaver and it displays UTF-8 characters properly, but if I open it in Excel it's doing this silly íÄ kind of thing instead. Here's what I've got at the head of my document:

header("content-type:application/csv;charset=UTF-8");
header("Content-Disposition:attachment;filename=\"CHS.csv\"");

This all seems to have the desired effect except Excel (Mac, 2008) doesn't want to import it properly. There's no options in Excel for me to "open as UTF-8" or anything, so … I'm getting a little annoyed.

I can't seem to find any clear solutions to this anywhere, despite a lot of people having the same problem. The thing I see the most is to include the BOM, but I can't exactly figure out how to do that. As you can see above I'm just echoing this data, I'm not writing any file. I can do that if I need to, I'm just not because there doesn't seem like a need for it at this point. Any help?

Update: I tried echoing the BOM as echo pack("CCC", 0xef, 0xbb, 0xbf); which I just pulled from a site that was trying to detect the BOM. But Excel just appends those three characters to the very first cell when it imports, and still messes up the special characters.


Solution

  • To quote a Microsoft support engineer,

    Excel for Mac does not currently support UTF-8

    Update, 2017: This is true of all versions of Microsoft Excel for Mac before Office 2016. Newer versions (from Office 365) do now support UTF-8.

    In order to output UTF-8 content that Excel both on Windows and OS X will be able to successfully read, you will need to do two things:

    1. Make sure that you convert your UTF-8 CSV text to UTF-16LE

      mb_convert_encoding($csv, 'UTF-16LE', 'UTF-8');
      
    2. Make sure that you add the UTF-16LE byte order mark to the start of the file

      chr(255) . chr(254)
      

    The next problem that appears only with Excel on OS X (but not Windows) will be when viewing a CSV file with comma separated values, Excel will render rows only with one row and all of the text along with the commas in the first row.

    The way to avoid this is to use tabs as your separated value.

    I used this function from the PHP comments (using tabs "\t" instead of commas) and it worked perfectly on OS X and Windows Excel.

    Note that to fix an issue with an empty column as the end of a row, that I did have to change the line of code that says:

        $field_cnt = count($fields);
    

    to

        $field_cnt = count($fields)-1;
    

    As some of the other comments on this page say, other spreadsheet apps like OpenOffice Calc, Apple's own Numbers and Google Doc's Spreadsheet have no issues with UTF-8 files with commas.

    See the table in this question for what works and doesn't work for Unicode CSV files in Excel


    As a side note, I might add that if you are using Composer, you should have a look at adding League\Csv to your requires. League\Csv has a really nice API for building CSV files.

    To use League\Csv with this method of creating CSV files, check out this example