Search code examples
phpsqlmysqlutf-8

UTF-8 replacement characters between every character when reading file in PHP


I've written some code to parse a tab-delimited file and import it into a MySQL table. The file is UTF-8, as is the MySQL table.

The import works, but when I view the data in a field in my table, every other character shows as the UTF-8 replacement character of �. For example, the number 4308817 in the raw file shows up in the database as "4�3�0�8�8�1�7". I do have some UTF-8 characters (like ë and so forth) in the data, so that is required.

I've tried all sorts of things with utf8_encode, decode, mb_convert_encoding, etc and nothing seems to make these show up without the � characters.


$lines = file($dir . '/' . $file);
foreach ($lines as $line_num => $line) {
    $arr = explode("\t", $line);
    if($line_num > 0) {                                 
        $idx = 0;
        $AddSQL = "INSERT INTO `$table` VALUES(";                   
        foreach ($arr as $field) {
            $value = $arr[$idx++];
            $AddSQL .= "\"" . str_replace('"', '\"', trim($value)) . "\", ";
        }
        $AddSQL = substr($AddSQL, 0, strlen($AddSQL) - 2);
        $AddSQL .= ")";
        
        $dbconn->query($AddSQL);
    }
}

Solution

  • Looks like I just found the solution by stumbling onto this function: https://www.php.net/manual/en/uconverter.transcode.php

    mb_convert_encoding did not work for me, but this did:

    $line = UConverter::transcode($line, 'UTF-8', 'UTF-16BE');