I'm reading an xml file generated by a 3rd-party application that includes the following:
<Cell>
<Comment ss:Author="Mark Baker">
<ss:Data xmlns="http://www.w3.org/TR/REC-html40"><B><Font html:Face="Tahoma" html:Size="8" html:Color="#000000">Mark Baker:</Font></B><Font html:Face="Tahoma" html:Size="8" html:Color="#000000"> Comment 1 - No align</Font></ss:Data>
</Comment>
</Cell>
What I'm trying to do is access the raw data from the Cell->Comment->Data element either "as is" or as an actual block of (X)HTML markup (preferably the latter).
if (isset($cell->Comment)) {
echo 'comment found<br />';
$commentAttributes = $cell->Comment->attributes($namespaces['ss']);
if (isset($commentAttributes->Author)) {
echo 'Author: ',(string)$commentAttributes->Author,'<br />';
}
$commentData = $cell->Comment->children($namespaces['ss']);
var_dump($commentData);
echo '<br />';
}
gives me:
comment found
Author: Mark Baker
object(SimpleXMLElement)#130 (2) { ["@attributes"]=> array(1) { ["Author"]=> string(10) "Mark Baker" } ["Data"]=> object(SimpleXMLElement)#129 (0) { } }
while
if (isset($cell->Comment)) {
echo 'comment found<br />';
$commentAttributes = $cell->Comment->attributes($namespaces['ss']);
if (isset($commentAttributes->Author)) {
echo 'Author: ',(string)$commentAttributes->Author,'<br />';
}
$commentData = $cell->Comment->Data->children();
var_dump($commentData);
echo '<br />';
}
gives me:
comment found
Author: Mark Baker
object(SimpleXMLElement)#129 (2) { ["B"]=> object(SimpleXMLElement)#118 (1) { ["Font"]=> string(11) "Mark Baker:" } ["Font"]=> string(21) " Comment 1 - No align" }
Unfortunately, simpleXML seems to be treating the whole element as a series of XML nodes. I'm sure I should be able to get this is raw data without complex looping, or feeding the element to a DOM Parser; perhaps using the xmlns="http://www.w3.org/TR/REC-html40" namespace to extract this cleanly, but I can't figure out how.
Any help appreciated.
A more complex example of the XML data:
<Cell>
<Comment ss:Author="Mark Baker">
<ss:Data xmlns="http://www.w3.org/TR/REC-html40">
<B><Font html:Face="Tahoma" html:Size="8" html:Color="#000000">Mark Baker:</Font></B><Font html:Face="Tahoma" html:Size="8" html:Color="#000000"> </Font><B><Font html:Face="Tahoma" x:Family="Swiss" html:Size="8" html:Color="#000000">Rich </Font><U><Font html:Face="Tahoma" x:Family="Swiss" html:Size="8" html:Color="#FF0000">Text </Font></U><Font html:Face="Tahoma" x:Family="Swiss" html:Size="8" html:Color="#000000">Comment</Font></B><Font html:Face="Tahoma" html:Size="8" html:Color="#000000"> Center Aligned</Font>
</ss:Data>
</Comment>
</Cell>
I've gone with a quick and dirty solution for the time being. In the longer term, I'll switch to using XMLReader (for all the reasons mentioned)... I just don't have the time to rewrite all the existing simpleXML code at the moment.
I've gone with:
$node = $cell->Comment->Data->asXML();
$comment = substr($node,49,-10);
$comment = strip_tags($comment);
While I'd prefer to keep the HTML markup, that will require additional work, so I'm simply stripping all the markup leaving me with the plain text (which is the critical element).
While this is a far from perfect solution, it does what I need it to do (for the moment), and I can move on to the next item in my "to do" list, having already added a new item of "rewrite using XMLReader" to that list.
Thanks for the help. I'll be sure to revisit this thread when I am doing that rewrite.