Search code examples
phpparsinghtml-parsingsimple-html-domphp-parser

PHP- HTML parsing :: How can be taken charset value of webpage with simple html dom parser?


PHP:: How can be taken charset value of webpage with simple html dom parser (utf-8, windows-255, etc..)?

remark: its have to be done with html dom parser http://simplehtmldom.sourceforge.net

Example1 webpage charset input:

<meta content="text/html; charset=utf-8" http-equiv="Content-Type">

result:utf-8



Example2 webpage charset input:

<meta content="text/html; charset=windows-255" http-equiv="Content-Type">

result:windows-255

Edit:

I try this (but its not works):

$html = file_get_html('http://www.google.com/');
$el=$html->find('meta[content]',0);
echo $el->charset; 

What should be change? (I know that $el->charset not working)

Thanks


Solution

  • You'll have to match the string using a regular expression (I hope you have PCRE...).

    $el=$html->find('meta[http-equiv=Content-Type]',0)
    $fullvalue = $el->content;
    preg_match('/charset=(.+)/', $fullvalue, $matches);
    echo $matches[1];
    

    Not very robust, but should work.