I've written some code to parse a tab-delimited file and import it into a MySQL table. The file is UTF-8, as is the MySQL table.
The import works, but when I view the data in a field in my table, every other character shows as the UTF-8 replacement character of �. For example, the number 4308817 in the raw file shows up in the database as "4�3�0�8�8�1�7". I do have some UTF-8 characters (like ë and so forth) in the data, so that is required.
I've tried all sorts of things with utf8_encode, decode, mb_convert_encoding, etc and nothing seems to make these show up without the � characters.
$lines = file($dir . '/' . $file);
foreach ($lines as $line_num => $line) {
$arr = explode("\t", $line);
if($line_num > 0) {
$idx = 0;
$AddSQL = "INSERT INTO `$table` VALUES(";
foreach ($arr as $field) {
$value = $arr[$idx++];
$AddSQL .= "\"" . str_replace('"', '\"', trim($value)) . "\", ";
}
$AddSQL = substr($AddSQL, 0, strlen($AddSQL) - 2);
$AddSQL .= ")";
$dbconn->query($AddSQL);
}
}
Looks like I just found the solution by stumbling onto this function: https://www.php.net/manual/en/uconverter.transcode.php
mb_convert_encoding
did not work for me, but this did:
$line = UConverter::transcode($line, 'UTF-8', 'UTF-16BE');