Search code examples
phpencodingcsvimport-from-excel

How to differentiate between MacRoman and Windows-1251 encodings in PHP?


I'm pulling my hairs for a few days now. I've googled and stackoverflowed a lot without success.

I'm importing some data from a csv file. This CSV file is generated in Excel either on Windows or Mac, which gives 2 different encodings "Windows-1251" and "MacRoman". Both are variants from ISO-8859-1 and mb_detect_encoding dos not help : it always detect the first encoding I put in the list.

For example :

mb_detect_encoding($buffer, 'macroman, windows-1251, UTF-8');

Will give "macroman".

With the same string, trying :

mb_detect_encoding($buffer, 'windows-1251, macroman, UTF-8');

will give "window-1251".

So how can you properly make the difference ? I need to convert my input string (the csv file content) to utf-8 to insert into the DB.

Maybe I'm missing something? How do you guys usually manage to parse csv files, and save data properly in DB (utf8).

Thanks for any clue!


Solution

  • I think the only way to make sure this is handled properly is to define a process for saving the csv file in the first place. Then you just have to utf8_encode what's coming in and it'll go fine...