To avoid "monster characters", I choose Unicode NCR form to store non-English characters in database (MySQL). Yet, the PDF plugin I use (FPDF) do not accept Unicode NCR form as a correct format; it displays the data directly like:
這個一個例子
but I want it to display like:
這個一個例子
Is there any method to convert Unicode NCR form to its original form?
p.s. the meaning of the sentence is "this is an example" in Traditional Chinese.
p.s. i know NCR form wastes storage space, but it is the safest method to store non-English characters. Correct me if I am wrong. thanks.
There is a simpler solution, using the PHP mbstring extension.
// convert any Decimal NCRs to Unicode characters
$string = "這個一個例子";
$output = preg_replace_callback(
'/(&#[0-9]+;)/u',
function($m){
return utf8_entity_decode($m[1]);
},
$string
);
echo $output; // 這個一個例子
//callback function for the regex
function utf8_entity_decode($entity){
$convmap = array(0x0, 0x10000, 0, 0xfffff);
return mb_decode_numericentity($entity, $convmap, 'UTF-8');
}
The 'utf8_entity_decode' function is from PHP.net (Andrew Simpson): http://php.net/manual/ru/function.mb-decode-numericentity.php#48085. I modified the code slightly to avoid the deprecated 'e'-modifier within the Regex.