On OS-X (PHP5.2.11) I have a file: siësta.doc (and thousand other with Unicode filenames) and I want to convert the file names to a web-consumable format (a-zA-Z0-9.). If I hardcode the file name above I can do the right conversion:
<?php
$file = 'siësta.doc';
echo preg_replace("/[^a-zA-Z0-9.]/u", '_', $file);
// Output: si_sta.doc
?>
But if I read the file names with scandir, I've got strange conversions:
<?php
$files = scandir(DIRNAME);
foreach ($files as $file) {
echo preg_replace("/[^a-zA-Z0-9.]/u", '_', $file);
// Output for the file above: sie_sta.doc
}
?>
I tried to detect the encoding, set the encoding, convert it with iconv functions. I tried the mb_ functions also. But it was just worse. What did I do wrong?
Thanks in advance
Interesting. After a bit recherché i've found that OSX stores filenames as "decomposed unicode" (see http://developer.apple.com/mac/library/qa/qa2001/qa1173.html). That is, "ë" is represented as "e" + diaresis symbol (0xcc88).