I have this simple code:
$input = '<p>ěščřžýáíé</p><p><img alt="" src="http://www.test.com/img.jpg" style="width: 100px; height: 100px;"></p>';
$dom = new DOMDocument('1.0', 'UTF-8');
$dom->encoding = 'UTF-8';
$dom->loadHTML($input, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$imgs = $dom->getElementsByTagName('img');
foreach($imgs as $img){
$src = $img->getAttribute('src');
$style = $img->getAttribute('style');
$newSrc = 'http://www.test.com/img001.jpg';
$img->setAttribute( 'src' , $newSrc );
}
$content = $dom->saveHTML();
Problem is that output is encoded. I expect same characters as are on input. I tried decoding without success. Something wrong with using DOM object?
<p>ěščřžýáíé<p><img alt="" src="http://www.test.com/img001.jpg" style="width: 100px; height: 100px;"></p></p>
saveHTML()
has a few 'features' which I don't understand, but when saving with a particular document node it will work if you then utf8_decode()
the result...
$content = utf8_decode($dom->saveHTML($dom->documentElement));
gives...
<p>ěščřžýáíé<p><img alt="" src="http://www.test.com/img001.jpg" style="width: 100px; height: 100px;"></p></p>