Search code examples
phputf-8character-encodingdecodeutf8mb4

Character encoding/decoding returns ? signs or weird results when form is submitted


I know there are many similar posts about this but so far I couldn't solve my problem although I went thru them. I'm trying to print exact search keywords when I hit Search button but unfortunately I'm seeing encoded versions and I cannot decode. I read up on utf8_decode and iconv but no luck so far. I have many languages being used in my site but I'm only struggling with the chars below.

Note: I'm using utf8mb4_unicode_ci as collation in MySQL and same keywords were inserted into table as Ç, Ğ, İ, Ö, Ş, Ü, ç, ğ, ı, ö, ş, ü so because of encoding differences, search will fail in most case.

I also have internal chars set to mb_internal_encoding("UTF-8");

SEARCH KEYWORD: Ç, Ğ, İ, Ö, Ş, Ü, ç, ğ, ı, ö, ş, ü

SITE

<!DOCTYPE html>
<html>
    <head>
        <meta charset="utf-8">
    </head>
    <body>
        <form method="post" action="search.php">
            Keyword: <input type="text" name="keywords" />
            <button type="submit" class="btn btn-default">Search</button>
        </form>
    </body>
</html>

search.php

var_dump($_POST);
echo $_POST['keywords'];
echo '<br />';
echo utf8_decode($_POST['keywords']);
echo '<br />';
echo iconv("ISO-8859-1", "UTF-8", $_POST['keywords']);

OUTPUT When UTF-8 tag in EXIST

array (size=1)
      'keywords' => string 'Ç, Ğ, İ, Ö, Ş, Ü, ç, ğ, ı, ö, ş, ü' (length=46)
Ç, Ğ, İ, Ö, Ş, Ü, ç, ğ, ı, ö, ş, ü
Ç, ?, ?, Ö, ?, Ü, ç, ?, ?, ö, ?, ü
Ç, Ğ, İ, Ö, Ş, Ü, ç, ğ, ı, ö, ş, ü

OUTPUT When UTF-8 tag in REMOVED

// This will also break front-end for certain characters.
array (size=1)
      'keywords' => string 'Ç, &#286;, &#304;, Ö, &#350;, Ü, ç, &#287;, &#305;, ö, &#351;, ü' (length=64)
Ç, Ğ, İ, Ö, Ş, Ü, ç, ğ, ı, ö, ş, ü
?, Ğ, İ, ?, Ş, ?, ?, ğ, ı, ?, ş, ?
Ç, Ğ, İ, Ö, Ş, Ãœ, ç, ğ, ı, ö, ş, ü

Solution

  • Adding accept-charset="ISO-8859-1" to form element solved the problem.

    OUTPUT

    array (size=1)
      'keywords' => string 'Ç, &#286;, &#304;, Ö, &#350;, Ü, ç, &#287;, &#305;, ö, &#351;, ü' (length=64)
    
    Ç, Ğ, İ, Ö, Ş, Ü, ç, ğ, ı, ö, ş, ü
    ?, Ğ, İ, ?, Ş, ?, ?, ğ, ı, ?, ş, ?
    Ç, Ğ, İ, Ö, Ş, Ãœ, ç, ğ, ı, ö, ş, ü
    

    Note: Whether I have mb_internal_encoding("UTF-8"); or not, it doesn't affect the result.