I'm parsing a document HTML using DOM -> SimpleXML:
$dom = new DOMDocument();
$dom->loadHTML($this->resource->get());
$html = simplexml_import_dom($dom);
And wanna load this piece:
<p>
Some text here <strong class="wanna-attributes-too">with strong element!</strong>.
But there can be even <b>bold</b> tag and many others.
</p>
Then I want do something and export it; but inner tags are parsed as child nodes of <p>
- that is formally right, but how can I reconstruct original document? Is there some library which can handle tags inside text values?
How about browsers as that is common case?
Thanks
// p.s. I CAN parse documents with nodes within text, that ISN'T problem; problem is that nodes lost their positions in original text
Update v1.0 Ok, solution can be encapsulating every node, which has nodes and value at the same time.
Updated question can be - how to get raw node value from simple_xml?
From previous HTML fragment I want something like this:
echo $nodeParagraph->rawValue;
and output will be
Some text here <strong class="wanna-attributes-too">with strong element!</strong>.
But there can be even <b>bold</b> tag and many others.
Update v2.0 My bad - SimpleXML node has saveXML (alis to asXML) which does what I want. Sorry for a noise. I'll post answer when I build working test.
So as @jzasnake pointed out, nice solution is to do this:
sample (input):
<p>
Some text here <strong class="wanna-attributes-too">with strong element!</strong>.
But there can be even <b>bold</b> tag and many others.
</p>
this outputs something like this in DOM:
where text is in incorret order (if you later wanna reconstruct it).
Solution can be eveloping every text into its own node (notice <value>
tags):
<p>
<value>Some text here </value><strong class="wanna-attributes-too">with strong element!</strong><value>.
But there can be even </value><b>bold</b><value> tag and many others.</value>
</p>
markup is a bit more talkative, but look at this:
Everything is preserved, so you are able to reconstruct original document as is.