Search code examples
clibxml2

Read contents from xml file and store in an array


I'm working with xml for the first time and I have some problems in storing the contents of the xml file in an array. I'm using libxml2 for parsing the xml file and I'm able to get the data and able to print it. The code is given below:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <libxml/xmlmemory.h>
#include <libxml/parser.h>
#include <wchar.h>

wchar_t buffer[7][50]={"\0"};

static void parseDoc(const char *docname) 
{

    xmlDocPtr doc;
    xmlNodePtr cur;
    xmlChar *key;
    int i=0;
    doc = xmlParseFile(docname);

    if (doc == NULL ) {

    fprintf(stderr,"Document not parsed successfully. \n");
     return;
    }

    cur = xmlDocGetRootElement(doc);

    if (cur == NULL) 
    {
      fprintf(stderr,"empty document\n");
      xmlFreeDoc(doc);
      return;
    }

    cur = cur->xmlChildrenNode;

    while (cur != NULL) 
    {
        key = xmlNodeListGetString(doc, cur->xmlChildrenNode, 1);
        wmemcpy(buffer[i],(wchar_t*)(key),size(key));   /*segmentation fault at this stage*/        
        printf("Content : %s\n", key);
        xmlFree(key);
        i++;
        cur = cur->next;
    }
    xmlFreeDoc(doc);
    return;
}

int main(void) 
{
   const char *docname="/home/workspace/TestProject/Text.xml;
   parseDoc (docname);
   return (1);
 }

The sample xml file is provided below

 <?xml version="1.0"?>
 <story>
  <author>John Fleck</author>
  <datewritten>June 2, 2002</datewritten>
  <keyword>example keyword</keyword>
  <headline>This is the headline</headline>
  <para>This is the body text.</para>
 </story>

The output of the file contents when printed on the screen were as below

Content : null

Content : John Fleck

Content : null

Content : June 2, 2002

Content : null

Content : example keyword

Content : null

Content : This is the headline

Content : null

Content : This is the body text.

I feel that the content of the file being null in few places is causing the problem in copy and hence generating the segmentation fault. Please let me know how to fix the problem and is there an better way to get the thing done. I had done a similar xml file read using MSXML parser and this is my first time with Linux API's.

EDIT The copying part is performed as below but the contents of the wchart array are garbled. Further help would be appreciated.

while (cur != NULL) {

    key = xmlNodeListGetString(doc, cur->xmlChildrenNode, 1);
    if(key!=NULL)
    {
        wmemcpy(DiscRead[i],(const wchar_t *)key,sizeof(key));
        i++;
    }

    printf("keyword: %s\n", key);
    xmlFree(key);

    cur = cur->next;
}

Solution

  • Your code has multiple problems:

    • You use wchar_t for your string array. This isn't appropriate for the UTF-8 encoded strings you'll get from libxml2. You should stick with xmlChar or use char.
    • You use xmlNodeListGetString to get the text content of nodes passing cur->xmlChildrenNode as node list. The latter will be NULL for text nodes, so xmlNodeListGetString will return NULL as an error condition. You should simply call xmlNodeGetContent on the current node but only if it is an element node.
    • Using xmlChildrenNode as field name is deprecated. You should use children.
    • The call to wmemcpy is dangerous. I'd suggest something safer like strlcpy.

    Try something like this:

    char buffer[7][50];
    
    static void parseDoc(const char *docname)
    {
        xmlDocPtr doc;
        xmlNodePtr cur;
        xmlChar *key;
        int i = 0;
        doc = xmlParseFile(docname);
    
        if (doc == NULL) {
            fprintf(stderr, "Document not parsed successfully. \n");
            return;
        }
    
        cur = xmlDocGetRootElement(doc);
    
        if (cur == NULL) {
            fprintf(stderr, "empty document\n");
            xmlFreeDoc(doc);
            return;
        }
    
        for (cur = cur->children; cur != NULL; cur = cur->next) {
            if (cur->type != XML_ELEMENT_NODE)
                continue;
            key = xmlNodeGetContent(cur);
            strlcpy(buffer[i], key, 50);
            printf("Content : %s\n", key);
            xmlFree(key);
            i++;
        }
    
        xmlFreeDoc(doc);
    }
    

    You should also check that i doesn't overrun the number of strings in your array.