Search code examples
cxml-parsingexpat-parser

What is an XML parser? Using Expat


This might seem like a simple question.

But I have been looking for an XML parser to use in one of my applications that is running on Linux.

I am using Expat and have parsed my XML file by reading one in. However, the output is the same as the input.

This is my file I am reading in:

<?xml version="1.0" encoding="utf-8"?>
    <books>
         <book>
              <id>1</id>
              <name>Hello, world!</name>
         </book>
    </books>

However, after I have passed this, I get exactly the same as the output. It makes me wonder what the parser is for?

Just one more thing. I am using Expat. Which seems quite difficult to use. My code is below: This reads in a file. But my application will have to parse a buffer that will be received by a socket, and not from a file. Is there any samples of this that anyone has?

int parse_xml(char *buff)
{
    FILE *fp;
    fp = fopen("mybook.xml", "r");
    if(fp == NULL)
    {
        printf("Failed to open file\n");
        return 1;
    }

   /* Obtain the file size. */
    fseek (fp, 0, SEEK_END);
    size_t file_size = ftell(fp);
    rewind(fp);

    XML_Parser parser = XML_ParserCreate(NULL);
    int done;
    memset(buff, 0, sizeof(buff));

    do
    {
        size_t len = fread(buff, 1, file_size, fp);
        done = len < sizeof(buff);

        if(XML_Parse(parser, buff, len, done) == XML_STATUS_ERROR)
        {
            printf("%s at line %d\n", XML_ErrorString(XML_GetErrorCode(parser)),
                                      XML_GetCurrentLineNumber(parser));
            return 1;
        }
    }
    while(!done);

    fclose(fp);
    XML_ParserFree(parser);

    return 0;
}

Solution

  • It took a while to wrap my head around XML parsing (though I do it in Perl, not C). Basically, you register callback functions. The parser will ping your callback for each node and pass in a data structure containing all kinds of juicy bits (like plaintext, any attributes, children nodes, etc). You have to maintain some kind of state information--like a hash tree you plug stuff into, or a string that contains all the guts, but none of the XML.

    Just remember that XML is not linear and it doesn't make much sense to parse it like a long hunk of text. Instead, you parse it like a tree. Good luck.