Search code examples
phpjsonhtml-parsingspecial-characterssimple-html-dom

(PHP) Simple HTML DOM parser: HTML symbols


I'm trying to get usernames from this website and this is what I've done:

$div = $html->find('div[class=micro-home-recent-review review-item]');
for ($i=0; $i<count($div); $i++){  
     $username = $div[$i]->find('div[class=tooltip-fullname]', 0)->find('b', 0)->plaintext;
     // I've tried using iconv but apparently it doesn't work
     $username = iconv(mb_detect_encoding($username), "UTF-8", $username); 
     $query = "INSERT INTO users ('name') VALUES ($username)";
     $pdo->query($query);
}

Then the newly inserted records in my database are:

database records

As you can see, most of the names are recorded with HTML symbols, which can be displayed normally on browsers, but get messed up when shown as JSON. The same problem happens when I tried to get reviews, and below is the sample JSON of a review:

enter image description here

I need the JSON to show data in my Android app, therefore this problem needs to be solved or the data won't be displayed properly. What could be a possible solution for this? I really need your help and suggestions.


Solution

  • try to use html_entity_decode() function.