I have a set of data inside a database which has been input with unicode characters, but they were interpreted as a string. That is, where there should be an apostrophe ’
I've actually got \u2019
So I now need to convert this into its character representation, which is ’
. Firstly it is quite easy to change the string into its entity version: ’
, then I need to turn it into the correct UTF-8 multibyte string.
I have attempted to do this in a number of ways; on my local server I can exctract the characters with a preg_match function and then pass each to the following function:
mb_convert_encoding($string, "UTF-8", "HTML-ENTITIES");
Sounds quite sensible, and works without issue. Turning off the UTF-8 charset in the browser shows that this has actually converted into ’
when read by the browser default encoding.
However, the exact same code when run in my production environment produces the dreaded "missing symbol" box when rendered as UTF-8. Turning off UTF-8 and it has produced whatever byte stream renders as ò°‘£
. It appears to be outputting 4 bytes rather than 3, I don't know if that is relevant as I'm not well read on character encoding.
I assume that the issue is with my mbstring settings. Here are the mbstring settings from my local server:
Multibyte Support enabled
Multibyte string engine libmbfl
HTTP input encoding translation disabled
Multibyte (japanese) regex support enabled
Multibyte regex (oniguruma) version 4.7.1
mbstring.detect_order no value no value
mbstring.encoding_translation Off Off
mbstring.func_overload 0 0
mbstring.http_input auto auto
mbstring.http_output UTF-8 UTF-8
mbstring.http_output_conv_mimetypes ^(text/|application/xhtml\+xml)^(text/|application/xhtml\+xml)
mbstring.internal_encoding UTF-8 UTF-8
mbstring.language neutral neutral
mbstring.strict_detection Off Off
mbstring.substitute_character no value no value
There are a few differences on my production environment:
Multibyte Support enabled
Multibyte string engine libmbfl
Multibyte (japanese) regex support enabled
Multibyte regex (oniguruma) version 3.7.1
mbstring.detect_order no value no value
mbstring.encoding_translation Off Off
mbstring.func_overload 0 0
mbstring.http_input auto auto
mbstring.http_output UTF-8 UTF-8
mbstring.internal_encoding UTF-8 UTF-8
mbstring.language neutral neutral
mbstring.strict_detection Off Off
mbstring.substitute_character no value no value
Anyone see what I'm doing wrong?
See if this can help you: hex2ascii and ascii2hex
ADDED on 09-19-2012:
function ascii2hex($ascii)
{
$hex = '';
for ($i = 0; $i < strlen($ascii); $i++)
{
$byte = strtoupper(dechex(ord($ascii{$i})));
$byte = str_repeat('0', 2 - strlen($byte)).$byte;
$hex .= $byte." ";
}
return $hex;
}
function hex2ascii($hex)
{
$ascii = '';
$hex = str_replace(" ", "", $hex);
for($i = 0; $i < strlen($hex); $i = $i+2)
$ascii .= chr(hexdec(substr($hex, $i, 2)));
return($ascii);
}