Search code examples
phpfile-existsapostrophe

PHP doesn't recognize filename with accented character "é" in it


Currently I am trying to check with PHP if a file exists. The file I am trying to check if it exists contains the character "é" in its name: 13067-AP-03 A - Situation projetée.pdf.

The code I use to check if the file exists is:

$filename = 'C:/13067-AP-03 A - Situation projetée.pdf';

if (file_exists($filename)) 
{
    echo "The file exists";
} else 
{
    echo "The file does not exist";
}

The problem that I am facing right now is that whenever I try to check if the file exists I get the message it doesn't exist. If I remove the "é" I get the message that the file does exist.

It looks that PHP somehow doesn't recognize the file if it has an accented character in it. I tried the following:

urlencode($filename);
addslashes($filename);
utf8_encode($filename);

None of which worked. I also tried:

setlocale(LC_ALL, "en_US.utf8");

Maybe worth noticing is that when I get the filename straight from PHP I get the following:

13067-AP-03 A - Situation projet�e.pdf

I have to do the following to have the filename displayed correctly:

$filename = iconv( "CP437", 'UTF-8', $filename);

I was wondering if someone had the same problem before and could help me out with this one. All help is greatly appreciated.

For those who are interested, the script runs on a windows machine.

Strangely this worked: I copied all the source code from Sublime Text 3 to notepad. I proceeded to save the source code in notepad by overwriting the PHP file.

Now when I check to see if the file exists it shows the following filename that exists:

13067-AP-03 A - Situation projet�e.pdf

The only problem that I am facing right now is that I want to download the file using file_get_contents. But file_get_contents doesn't interpet the as the correct character.


Solution

  • I think it's a problem of the PHP under Windows. I downloaded a Windows binary copy to my Windows who's in Japanese and successfully reproduced your problem.

    According to https://bugs.php.net/bug.php?id=47096

    So, if you have a generic name of a file (along with its path) as a Unicode string $u (for example UTF-8 encoded) and you want to try to save it with that name under Windows, you must first check the current locale calling setlocale(LC_CTYPE, 0) to retrieve the current code page, then you must convert $u to an array of bytes according to the code page; if one or more code points have no counterpart in the current code page, the file cannot be saved with that name from PHP. Dot.

    My code page is CP932, which you can see yours by running chcp in cmd.

    So the code is expected to be:

    $filename='C:\Users\Frederick\Desktop\13067-AP-03 A - Situation projetée.pdf';
    $filename=mb_convert_encoding($filename, 'CP932', 'UTF-8');
    var_dump($filename);
    var_dump(file_exists($filename));
    

    But this won't work! Why? Because CP932 doesn't contain the character of é!

    According to https://msdn.microsoft.com/en-us/library/windows/desktop/dd317748%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396

    NTFS stores file names in Unicode. In contrast, the older FAT12, FAT16, and FAT32 file systems use the OEM character set.

    Windows itself uses UTF-16LE, which is called Unicode by Microsoft, to save its file names. But PHP doesn't support a UTF-16LE encoded file name.

    In conclusion, it's a pity that I cannot find a way to solve the problem rather than escaping all those characters when naming the files if you work on Windows. And I also do not think that the team of PHP will solve the problem in the future.