Search code examples
phppdocharacter-encodingmariadb

PHP - converting data from MariaDB to JSON - � wrong encoding


I seem to have some problem with encoding, but I can't pinpoint it.

PHPMyAdmin says:

Server type: MariaDB
Server version: 10.3.39-MariaDB-log - MariaDB Server
Server charset: ISO 8859-2 Central European (latin2)
Server connection collation: utf8mb4_unicode_ci

I can't change the SQL server in any way, this is provided by my website hosting provider.

All my dbs, tables and columns use utf8mb4_unicode_ci. All files are encoded as UTF-8. The values display properly both in PHPMyAdmin and in MySQLWorkbench. Other scripts on my website work fine, displaying english, russian, chinese, etc. Just this one is not complying for some reason. I tried inserting the data through PHPMA, Workbench, and even from the very same script.

I connect using PDO, with charset specified, via an included file:

<?php
if (!isset($pdo))
{
    $DBHOST = 'localhost';
    $DBNAME = '***';
    $DBUSER = '***';
    $DBPASS = '***';
    $DBCHRS = 'utf8mb4';

    $dsn = 'mysql:host='.$DBHOST.';dbname='.$DBNAME.';charset='.$DBCHRS;
    $options = [PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION, PDO::ATTR_DEFAULT_FETCH_MODE => PDO::FETCH_ASSOC, PDO::ATTR_EMULATE_PREPARES => false, PDO::MYSQL_ATTR_USE_BUFFERED_QUERY => false];
    try
    {
        $pdo = new PDO($dsn, $DBUSER, $DBPASS, $options);
    }
    catch (\PDOException $e)
    {
        echo 'Could not connect to the database!<br>Message: ', $e->getMessage(), '<br>Code: ', $e->getCode();
        exit();
    }
}
?>

and then

$json = [];
$json['people'] = [];
$json['relations'] = [];

$stmt = $pdo->prepare('SELECT * FROM `tree_people`;');
$stmt->execute();
while($row = $stmt->fetch(PDO::FETCH_ASSOC))
    $json['people'][] = $row;

$stmt = $pdo->prepare('SELECT * FROM `tree_relations`;');
$stmt->execute();
while($row = $stmt->fetch(PDO::FETCH_ASSOC))
    $json['relations'][] = $row;

/*/
header('Access-Control-Allow-Origin: *');
header('Content-Type: application/json; charset=utf-8');
/*/
header('Content-Type: text/html; charset=utf-8');
//*/

//*/
echo '<pre>';
print_r($json);
echo '</pre>';
//*/

echo '<pre>';
echo json_encode($json, JSON_UNESCAPED_UNICODE);
echo '</pre>';

JSON fails to generate (empty string?) and the content displayed with print_r has all the special characters showing as �.

I'm going crazy, what is going on?

Edit: I can properly read data from other tables. However it is only possible when I write the data to the DB through my own editor, not through PMA. See https://herhor.net/news/?id=1 when I first inserted it via PMA, it was full of unknown characters. However, now when I read it in PMA or Workbench, it is full of scrambled characters.

It seems that there is some mismatch between the encoding used by PMA/Workbench and the one used by all my scripts. Shouldn't the DB deal with it automatically?

Edit2: As requested, SHOW VARIABLES LIKE 'char%'; for Workbench:

character_set_client    utf8mb4
character_set_connection    utf8mb4
character_set_database  utf8mb4
character_set_filesystem    binary
character_set_results   utf8mb4
character_set_server    latin2
character_set_system    utf8
character_sets_dir  /usr/share/mysql/charsets/

For PMA:

character_set_client    utf8mb4
character_set_connection    utf8mb4
character_set_database  utf8mb4
character_set_filesystem    binary
character_set_results   utf8mb4
character_set_server    latin2
character_set_system    utf8
character_sets_dir  /usr/share/mysql/charsets/

For PHP PDO:

Array
(
    [0] => stdClass Object
        (
            [Variable_name] => character_set_client
            [Value] => latin2
        )

    [1] => stdClass Object
        (
            [Variable_name] => character_set_connection
            [Value] => latin2
        )

    [2] => stdClass Object
        (
            [Variable_name] => character_set_database
            [Value] => utf8mb4
        )

    [3] => stdClass Object
        (
            [Variable_name] => character_set_filesystem
            [Value] => binary
        )

    [4] => stdClass Object
        (
            [Variable_name] => character_set_results
            [Value] => latin2
        )

    [5] => stdClass Object
        (
            [Variable_name] => character_set_server
            [Value] => latin2
        )

    [6] => stdClass Object
        (
            [Variable_name] => character_set_system
            [Value] => utf8
        )

    [7] => stdClass Object
        (
            [Variable_name] => character_sets_dir
            [Value] => /usr/share/mysql/charsets/
        )

)

Also here is the SQL import/export file for both tables: https://pastebin.com/33Ap4Vje


Solution

  • As your edit shows, the problem is definitely with the PDO connection as it is showing latin2 in the various character_set_* variables.

    Does adding SET NAMES utf8mb4 to your $options array help:

    $options = [
        PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION,
        PDO::ATTR_DEFAULT_FETCH_MODE => PDO::FETCH_ASSOC,
        PDO::ATTR_EMULATE_PREPARES => false,
        PDO::MYSQL_ATTR_USE_BUFFERED_QUERY => false,
        PDO::MYSQL_ATTR_INIT_COMMAND => 'SET NAMES utf8mb4'
    ];