I am parsing a html document with libxml2 and remove some elements based on xpath querys. For example I want to remove all elements that contain "display: none" with:
stripNode(doc, "//*[contains(@style,'display:none')]");
...
public static void stripNode(Html.Doc* doc, string xpath)
{
Xml.XPath.Context cntx = new Xml.XPath.Context(doc);
Xml.XPath.Object* res = cntx.eval_expression(xpath);
if(res != null
&& res->type == Xml.XPath.ObjectType.NODESET
&& res->nodesetval != null)
{
for(int i = 0; i < res->nodesetval->length(); ++i)
{
Xml.Node* node = res->nodesetval->item(i);
if(node != null)
{
node->unlink();
node->free_list();
}
}
}
delete res;
}
but I came across documents that have a element with "display: none" inside another element with "display: none". Now when the element higher in the order gets unlinked and freed all its children are gone too. But the second element is still part of "res" and is not "null". So I get a crash because of double free.
Is there a way to check if a node is still part of the document or already freed. Alternatively is there a way to only look for the first match of the xpath-query and look for the next match after the node is unlinked and freed? I guess executing
cntx.eval_expression(xpath);
again after each unlinked node would be very slow.
Thank you for your help :)
I'd suggest another way around to achieve the same. You can use a more specific xpath, so that in case there are nested elements having style
attribute contains "display:none"
, only the outer-most elements gets selected :
//*[contains(@style,'display:none')][not(ancestor::*[contains(@style,'display:none')])]