Search code examples
phputf-8cp1252

problem string special charset hextobin php


I have 1 hex string that I want to pass to a string, but I get strange characters, why?

$string = "52656C6F6A204E616D69209620534B4D4549209620416375E17469636F";
$productnamehex = hex2bin($string);

result:

Reloj Nami � SKMEI � Acu�tico

should show:

Reloj Nami – SKMEI – Acuático

poor with utf8_encode and utf8_decode but nothing seems to work.

PHP TESTER CODE:

<?php
$string = "52656C6F6A204E616D69209620534B4D4549209620416375E17469636F";
$productnamehex = hex2bin($string);
echo $productnamehex;

Solution

  • Your string is encoded with MS cp1252.

    The function utf8_encode() is misleading in that it only partially translates it because it only works with ISO-8859-1, of which cp1252 is a superset that includes additional characters like em-dashes and en-dashes, as in your string.

    To properly convert the string:

    $hex = "52656C6F6A204E616D69209620534B4D4549209620416375E17469636F";
    $cp1252 = hex2bin($hex);
    $utf8 = mb_convert_encoding($cp1252, 'UTF-8', 'cp1252');
    
    var_dump($hex, $cp1252, $utf8);
    

    Output:

    string(58) "52656C6F6A204E616D69209620534B4D4549209620416375E17469636F"
    string(29) "Reloj Nami � SKMEI � Acu�tico"
    string(34) "Reloj Nami – SKMEI – Acuático"
    

    See also: UTF-8 all the way through

    Be warned that text encoding is rarely ever obvious just by looking at the data, and even the functions that purport to detect the encoding are simply making educated guesses. If it weren't for the dashes it simply wouldn't be possible to know which encoding it was for certain.

    Text encoding is important metadata that must be tracked alongside the data itself.