Search code examples
linuxxmllibxml2

libxml2 parse similar nodes from different parent nodes


I have an XML file with the following structure (simplified for the purpose of this question) :

<SuperNode>
  <MediumNode Enable="YES">
    <Child attr1="1" />
    <Child attr1="2" />
    <Child attr1="3" />
  </MediumNode>
  <MediumNode Enable="YES">
    <Child attr1="4" />
    <Child attr1="5" />
  </MediumNode>
</SuperNode>

I parse it with libxml2 in C using the following logic (error checks have been omitted for simplicity), I have taken this logic from libxml2 examples from their website . I need to iterate through all attributes of "Child" nodes of all "MediumNode" nodes :

xmlDocPtr doc = xmlParseFile("/path/to/xml");     /* check for errors */
xmlXPathContextPtr ctx = xmlXPathNewContext(doc); /* check for errors */
xmlXPathObjectPtr obj = xmlXPathEvalExpression("/*/*[@Enable=\"YES\"]/Child", ctx); /* check for errors */
xmlXPathNodeSetIsEmpty(obj->nodesetval);          /* check for errors */
int n_nodes = obj->nodesetval->nodeNr;            /* this is correctly identified as 5 */
xmlNodePtr node = obj->nodesetval->nodeTab[0];    /* list of parsed Child nodes */
xmlAttrPtr attr = NULL;
for (int i = 0; i < n_nodes; i++) {
  attr = xmlGetProp(node, "attr1");               /* correctly gets the attribute value */
  /* process attribute info */
  xmlFree(attr);
  node = node->next;
}
xmlXPathFreeObject(obj);     /* cleanup code */
xmlFreeDoc(doc);
xmlCleanupParser();

Now the "n_nodes" variable in the above logic correctly identifies the number of "Child" nodes in all of the "MediumNode" nodes (actual XML file has several of MediumNode nodes). But when the for loop starts, it crashes after it has parsed the "Child" nodes from only the first "MediumNode" node. In the above case, it will crash after 3 nodes of first "MediumNode". Is there any bug in the logic that I have written? Feel free to point it out. I have been at this for 2 days now. Oh, and the used xpath is verified on command-line using xmlstarlet binary :

xmlstarlet sel -t -c "/*/*[@Enable=\"YES\"]/Child" /path/to/xml

This outputs all five "Child" nodes properly. I am doing this on custom embedded Linux 3.2.0 , libc 2.13, libxml2 2.8.0

Thanks.


Solution

  • Slight change in your code and it works.

    xmlDocPtr doc = xmlParseFile("/path/to/xml");     /* check for errors */
    xmlXPathContextPtr ctx = xmlXPathNewContext(doc); /* check for errors */
    xmlXPathObjPtr obj = xmlXPathEvalExpression("/*/*[@Enable=\"YES\"]/Child", ctx); /* check for errors */
    xmlXPathNodeSetIsEmpty(obj->nodesetval);          /* check for errors */
    int n_nodes = obj->nodesetval->nodeNr;            /* this is correctly identified as 5 */
    xmlNodePtr node;                                  
    xmlAttrPtr attr = NULL;
    for (int i = 0; i < n_nodes; i++) {
      node = obj->nodesetval->nodeTab[i];             /* gets the ith child node 
      attr = xmlGetProp(node, "attr1");               /* correctly gets the attribute value for ith node*/
      /* process attribute info */
      printf("found attr1=%s\n", attr);
      xmlFree(attr);
    //  node = node->next;
    }
    xmlXPathFreePbject(obj);     /* cleanup code */
    xmlFreeDoc(doc);
    xmlCleanupParser();
    

    The debug output from above code is like this:

    found attr1=1
    found attr1=2
    found attr1=3
    found attr1=4
    found attr1=5
    

    With your code, issue is in the for loop - not with your xpath. If you print the attribute value in the loop, you will see that it prints (null) after every correct attribute. Debug output from you code is something like this:

    found attr1=1
    found attr1=(null)
    found attr1=2
    found attr1=(null)
    found attr1=3