Search code examples
phpwindowsencodingfilenames

PHP - Windows - filename incorrect after upload (ü saved as ü etc.)


I have this home made app that allows multiple file uploads, I pass the files to php with AJAX, create new dir with php, move there uploaded files and save the dir location to database. Then to see the files I run listing of the directory location saved in the db.

The problem is that files come from all around the world so very often they have some non latin characters like for example ü. When I echo the filename in php names appear correctly even when they have names written in Arabic, yet they are being saved on the server with encoded names as for example ü in place of ü. When I list the files from directory I can see the name ü.txt insted of ü.txt but when I click on it server returns error object not found (since on the server it is saved as ü.txt and it reads the link as ü.txt).

I tried some of the suggested solutions as for example using iconv, but the filenames are still being saved the same way.

I could swear the problem wasn't present when the web app was hosted on linux, but at the moment I am not so sure about it anymore. Right now I temporarily run it on xampp (on Windows) and it seems like filenames are saved using windows-1252 encoding (default Windows' encoding on the server). Is it default Windows encoding related problem?

To be honest I do not know how to approach that problem and I would appreciate any help. Should I keep on trying to save the files in different character encoding or would it be better to approach it different way and change the manner of listing the already saved and encoded files?

EDIT. According to the (finally) closed bug report it was fixed in php 7.1.


Solution

  • In the end I solved it with the following approach:

    1. When uploading the files I urlencode the names with rawurlencode()
    2. When fetching the files from server they are obviously URL encoded so I use urldecode($filename) to print correct names
    3. Links in a href are automatically translated, so for example "%20" becomes a " " and URL ends up being incorrect since it links to incorrect filename. I decided to encode them back and print them ending up with something like this: print $dirReceived.rawurlencode($file); ($dirReceived is the directory where received files are stored, defined earlier in the code)
    4. I also added download attribute with urldecode($filename) to save the file with UTF-8 name when needed.

    Thanks to this I have files saved on the server with url encoded names. Can open them in browser (very important as most of them are *.pdf) and can download them with correct name which lets me upload and download even files with names written in Arabic, Cyrillic, etc.

    So far I tested it and looks good. I am thinking of implementing it in production code. Any concerns/thoughts on it?

    EDIT.

    Since there are no objections I select my answer as the one that solved my problem. After doing some testing everything looks good on client and server side. When saving the files on server they are URL encoded, when downloading them they are decoded and saved with correct names.

    At the beginning I was using the code:

        for($i=0;$i<count($_FILES['file']['name']);$i++) 
    {
        move_uploaded_file($_FILES['file']['tmp_name'][$i],
        "../filepath/" . $_FILES['file']['name'][$i]);
    }
    

    This method caused the problem upon saving file and replaced every UTF-8 special character with cp1252 encoded one (ü saved as ü etc.), so I added one line and replaced that code with the following:

    for($i=0;$i<count($_FILES['file']['name']);$i++) 
    {
        $fname= rawurlencode($_FILES['file']['name'][$i]);
        move_uploaded_file($_FILES['file']['tmp_name'][$i],
        "../filepath/" . $fname);
    }
    

    This allows me to save any filename on server using URL encoding (% and two hexadecimals) which is compatible with both cp1252 and UTF-8.

    To list the saved files I use filepaths I have saved in DB and list them for files. I was using the following code:

        if (is_dir($dir)){
      if ($dh = opendir($dir)){
        while (($file = readdir($dh)) !== false){
            if(is_file($dir . $file)){
    
        echo "<li><a href='".$dir.$file."' download='".$file ."'>".$file."</a></li><br />";
    
        }
    }
        closedir($dh);
      }
    }
    

    Since URL encoded filenames were decoded automatically I changed it to:

        if (is_dir($dir)){
      if ($dh = opendir($dir)){
        while (($file = readdir($dh)) !== false){
            if(is_file($dir . $file)){
                echo "<li><a href='";
                print $dir.rawurlencode($file);
                echo "' download='" . urldecode($file) ."'>".urldecode($file)."</a></li><br />";
        }
    }
        closedir($dh);
      }
    }
    

    I don't know if this is the best way to solve it but works perfectly, also I am aware that it is generally a good practice not to use php to generate html tags but at the moment I have some critical bugs that need addressing so first that and then I'll have to work on the appearance of the code itself.

    EDIT2

    Also the great thing is I do not have to change names of the already uploaded files which in my case is a big advantage.