I am using this:
foreach ($paragraph->childNodes as $child) {
$value .= $paragraph->ownerDocument->saveHTML($child);
}
The problem is that in my $value
I have
there where in the original document I have a linebreak.
Here is one part of the source HTML:
<html>
<head>
<meta http-equiv=Content-Type content="text/html; charset=utf-8">
<meta name=Generator content="Microsoft Word 12 (filtered)">
<title>SomeTitle</title>
</head>
<body lang=EN-US link=blue vlink=purple>
<div class=WordSection1>
<p class=3abstract><b>Abstract:</b> Five new anthranilic acid derivatives.</p>
</body>
</html>
Did you faced this before?
is the decimal HTML entity representation of "carriage return", so it's perfectly fine in the output.
To output the actual carriage return character, try setting the output encoding of the parent document to UTF-8: $paragraph->ownerDocument->encoding = 'UTF-8'
.