Search code examples
phpunicodeencodingutf-8php-curl

get UNICODE character instead of HEX - cURL PHP


I am using this scraper for IMDB, and the problem is that some characters are in UNICODE ï. I use this scraper with CURL, and the answer its a string encoded in UTF8 I try to get the encode of the string with mb_detect_encoding() and it answer with UTF-8

$html = $this->geturl("${imdbUrl}combined");
mb_detect_encoding($html);

So I have a string with some HEX values inside, like this for example:

$var = 'Saïd Taghmaoui'

So I try to get the value of $html with utf8_decode() but no luck, I still have some characters in HEX.

So I have a few questions:

1- What's the best solution for this? I imagine different scenarios for example a read the string and with a REGEX change all the HEX codes with the character, but I am not sure if this one its the best solution, and also I dont know how to create the REGEX for this.

2- The solution can be through cURL? I mean manage some configurations to set the encoding of cURL in UTF-8 for example?

I try with the functions recode_string or iconv or mb_convert_encoding


Solution

  • Well basically my problem is that the answer from the scraper comes with UTF-8 encoding, but before print the text I need to work the data with this functions

    $var = 'Saïd Taghmaoui'
    htmlspecialchars(html_entity_decode($var, ENT_QUOTES, 'UTF-8'), ENT_NOQUOTES, 'UTF-8');