Search code examples
phpdirectoryentrydirectory-listing

how to iterate over non-English file names in PHP


I have a directory which contains several files, many of which has non-english name. I am using PHP in Windows 7.

I want to list the filename and their content using PHP.

Currently I am using DirectoryIterator and file_get_contents. This works for English files names but not for non-English (chinese) file names.

For example, I have filenames like "एक और प्रोब्लेम.eml", "hello 鶨鶖鵨鶣鎹鎣.eml".

  1. DirectoryIterator is not able to get the filename using ->getFilename()
  2. file_get_contents is also not able to open even if I hard code the filename in its parameter.

How can I do it?


Solution

  • This is not possible. It's a limitation of PHP. PHP uses the multibyte versions of Windows APIs; you're limited to the characters your codepage can represent.

    See this answer.

    Directory contents:

    D:\Users\Cataphract\Desktop\teste2>dir
     Volume in drive D is GRANDEDISCO
     Volume Serial Number is 945F-DB89
    
     Directory of D:\Users\Cataphract\Desktop\teste2
    
    01-06-2010  17:16              .
    01-06-2010  17:16              ..
    01-06-2010  17:15                 0 coptic small letter shima follows ϭ.txt
    01-06-2010  17:18                86 teste.php
                   2 File(s)             86 bytes
                   2 Dir(s)  12.178.505.728 bytes free
    

    Test file contents:

    <?php
    exec('pause');
    foreach (new DirectoryIterator(".") as $v) {
        echo $v."\n";
    }
    

    Test file results:

    .
    ..
    coptic small letter shima follows ?.txt
    teste.php
    

    Debugger output:

    Call stack (PHP 5.3.0):

    >   php5ts_debug.dll!readdir_r(DIR * dp=0x02f94068, dirent * entry=0x00a7e7cc, dirent * * result=0x00a7e7c0)  Line 80   C
        php5ts_debug.dll!php_plain_files_dirstream_read(_php_stream * stream=0x02b94280, char * buf=0x02b9437c, unsigned int count=260, void * * * tsrm_ls=0x028a15c0)  Line 820 + 0x17 bytes   C
        php5ts_debug.dll!_php_stream_read(_php_stream * stream=0x02b94280, char * buf=0x02b9437c, unsigned int size=260, void * * * tsrm_ls=0x028a15c0)  Line 603 + 0x1c bytes  C
        php5ts_debug.dll!_php_stream_readdir(_php_stream * dirstream=0x02b94280, _php_stream_dirent * ent=0x02b9437c, void * * * tsrm_ls=0x028a15c0)  Line 1806 + 0x16 bytes    C
        php5ts_debug.dll!spl_filesystem_dir_read(_spl_filesystem_object * intern=0x02b94340, void * * * tsrm_ls=0x028a15c0)  Line 199 + 0x20 bytes  C
        php5ts_debug.dll!spl_filesystem_dir_open(_spl_filesystem_object * intern=0x02b94340, char * path=0x02b957f0, void * * * tsrm_ls=0x028a15c0)  Line 238 + 0xd bytes   C
        php5ts_debug.dll!spl_filesystem_object_construct(int ht=1, _zval_struct * return_value=0x02b91f88, _zval_struct * * return_value_ptr=0x00000000, _zval_struct * this_ptr=0x02b92028, int return_value_used=0, void * * * tsrm_ls=0x028a15c0, long ctor_flags=0)  Line 645 + 0x11 bytes  C
        php5ts_debug.dll!zim_spl_DirectoryIterator___construct(int ht=1, _zval_struct * return_value=0x02b91f88, _zval_struct * * return_value_ptr=0x00000000, _zval_struct * this_ptr=0x02b92028, int return_value_used=0, void * * * tsrm_ls=0x028a15c0)  Line 658 + 0x1f bytes   C
        php5ts_debug.dll!zend_do_fcall_common_helper_SPEC(_zend_execute_data * execute_data=0x02bc0098, void * * * tsrm_ls=0x028a15c0)  Line 313 + 0x78 bytes   C
        php5ts_debug.dll!ZEND_DO_FCALL_BY_NAME_SPEC_HANDLER(_zend_execute_data * execute_data=0x02bc0098, void * * * tsrm_ls=0x028a15c0)  Line 423  C
        php5ts_debug.dll!execute(_zend_op_array * op_array=0x02b93888, void * * * tsrm_ls=0x028a15c0)  Line 104 + 0x11 bytes    C
        php5ts_debug.dll!zend_execute_scripts(int type=8, void * * * tsrm_ls=0x028a15c0, _zval_struct * * retval=0x00000000, int file_count=3, ...)  Line 1188 + 0x21 bytes C
        php5ts_debug.dll!php_execute_script(_zend_file_handle * primary_file=0x00a7fad4, void * * * tsrm_ls=0x028a15c0)  Line 2196 + 0x1b bytes C
        php.exe!main(int argc=2, char * * argv=0x028a14c0)  Line 1188 + 0x13 bytes  C
        php.exe!__tmainCRTStartup()  Line 555 + 0x19 bytes  C
        php.exe!mainCRTStartup()  Line 371  C
    

    Is it really a question mark?

    dp->fileinfo
    {dwFileAttributes=32 ftCreationTime={...} ftLastAccessTime={...} ...}
        dwFileAttributes: 32
        ftCreationTime: {dwLowDateTime=2784934701 dwHighDateTime=30081445 }
        ftLastAccessTime: {dwLowDateTime=2784934701 dwHighDateTime=30081445 }
        ftLastWriteTime: {dwLowDateTime=2784934701 dwHighDateTime=30081445 }
        nFileSizeHigh: 0
        nFileSizeLow: 0
        dwReserved0: 3435973836
        dwReserved1: 3435973836
        cFileName: 0x02f9409c "coptic small letter shima follows ?.txt"
        cAlternateFileName: 0x02f941a0 "COPTIC~1.TXT"
    dp->fileinfo.cFileName[34]
    63 '?'
    

    Yes! It's character #63.