Search code examples
phputf-8ldapcharacter

� character from ldap


i am getting some strange characters form ldap server when i search some user info.if value contains turkish characters like 'ç' it replaces to '�'.in this situatian i convert string to utf-8 than str_replace to fix it.My function is that;

 function utf8char($str) {     
    $search = array('Ý','ý', 'þ' ,'Þ' ,'ð','Ð');
    $replace = array('İ' ,'ı' ,'ş','Ş','ğ','Ğ');
    return str_replace($search, $replace, $str);
}

But sometimes that causes some problem , so i have to detect if string contains '�' character to fix it.strpos does not work.Can anyone say something about this? And what is this shit '�' character , i would be happy if anyone can explain...

Edit: Here is my code snippet;

$name = $ldapHandler->get_user_info('username')['name'];
echo $name;
echo utf8_decode($name);
echo mb_convert_encoding($name,'utf-8');
echo utf8char(mb_convert_encoding($name,'utf-8'));

and output of this code;

Bilgi ��lem Daire Ba�kanl���
Bilgi ?lem Daire Ba?kanl??
Bilgi Ýþlem Daire Baþkanlýðý
Bilgi İşlem Daire Başkanlığı (this is the correct string)

Solution

  • It has been a long time but i decided to share my solution who faced with the same problem.

    This function worked for me:

    function repair($value) {
    
        $res = @iconv("UTF-8", "UTF-8//IGNORE", $value);
    
        if (strlen($value) != strlen($res)) {
            return w1250_to_utf8($value);
        }
    
        return $res;
    }
    
    function w1250_to_utf8($text) {
        // map based on:
        // http://konfiguracja.c0.pl/iso02vscp1250en.html
        // http://konfiguracja.c0.pl/webpl/index_en.html#examp
        // http://www.htmlentities.com/html/entities/
        $map = array(
            chr(0x8A) => chr(0xA9),
            chr(0x8C) => chr(0xA6),
            chr(0x8D) => chr(0xAB),
            chr(0x8E) => chr(0xAE),
            chr(0x8F) => chr(0xAC),
            chr(0x9C) => chr(0xB6),
            chr(0x9D) => chr(0xBB),
            chr(0xA1) => chr(0xB7),
            chr(0xA5) => chr(0xA1),
            chr(0xBC) => chr(0xA5),
            chr(0x9F) => chr(0xBC),
            chr(0xB9) => chr(0xB1),
            chr(0x9A) => chr(0xB9),
            chr(0xBE) => chr(0xB5),
            chr(0x9E) => chr(0xBE),
            chr(0x80) => '€',
            chr(0x82) => '‚',
            chr(0x84) => '„',
            chr(0x85) => '…',
            chr(0x86) => '†',
            chr(0x87) => '‡',
            chr(0x89) => '‰',
            chr(0x8B) => '‹',
            chr(0x91) => '‘',
            chr(0x92) => '’',
            chr(0x93) => '“',
            chr(0x94) => '”',
            chr(0x95) => '•',
            chr(0x96) => '–',
            chr(0x97) => '—',
            chr(0x99) => '™',
            chr(0x9B) => '’',
            chr(0xA6) => '¦',
            chr(0xA9) => '©',
            chr(0xAB) => '«',
            chr(0xAE) => '®',
            chr(0xB1) => '±',
            chr(0xB5) => 'µ',
            chr(0xB6) => '¶',
            chr(0xB7) => '·',
            chr(0xBB) => '»',
        );
    
        $search = array('Ý', 'ý', 'þ', 'Þ', 'ð', 'Ð');
        $replace = array('İ', 'ı', 'ş', 'Ş', 'ğ', 'Ğ');
    
        mb_internal_encoding("ISO-8859-1");
        return str_replace($search, $replace, html_entity_decode(mb_convert_encoding(strtr($text, $map), 'UTF-8'), ENT_QUOTES, 'UTF-8'));
    }