I have a question about preg_match, if I try to fetch something like this: Århus er en by i Danmark means Århus is a city in Denmark
preg_match( "#<div id=[\"']faktaDiv[\"']>(.*?)</div>#si", $webside, $a2 );
echo $a2;
Then the output will be:
�rhus er en by i Danmark means �rhus is a city in Denmark
How can I fix this? Basically it needs to allow æ ø å.
For the regex approach you need the u
modifier. For a full list of PHP's modifiers see http://php.net/manual/en/reference.pcre.pattern.modifiers.php, the i
and s
you are currently using are 2 other modifiers.
preg_match( "#<div id=[\"']faktaDiv[\"']>(.*?)</div>#siu", $webside, $a2 );
It looks like you are parsing HTML though so I'd use the domdocument to parse that string.
$doc = new DOMDocument();
$doc->loadHTML('<div id="faktaDiv">Test Stuff</div>');
$divs = $doc->getElementsByTagName('div');
foreach($divs as $div) {
if($div->getAttribute('id') == 'faktaDiv') {
echo $div->nodeValue;
}
}
To pull the title
you should use a parser like this.
$doc = new DOMDocument();
$doc->loadHTML('<title>Test Stuff</title>');
$title = $doc->getElementsByTagName('title')->item(0)->nodeValue;
echo $title;
As far as I know there should only be one title
one a page. If this isn't the case take off the ->item(0)->nodeValue
and loop through the array.
PHP Demo: https://eval.in/502432