This might seem like a simple question.
But I have been looking for an XML parser to use in one of my applications that is running on Linux.
I am using Expat and have parsed my XML file by reading one in. However, the output is the same as the input.
This is my file I am reading in:
<?xml version="1.0" encoding="utf-8"?>
<books>
<book>
<id>1</id>
<name>Hello, world!</name>
</book>
</books>
However, after I have passed this, I get exactly the same as the output. It makes me wonder what the parser is for?
Just one more thing. I am using Expat. Which seems quite difficult to use. My code is below: This reads in a file. But my application will have to parse a buffer that will be received by a socket, and not from a file. Is there any samples of this that anyone has?
int parse_xml(char *buff)
{
FILE *fp;
fp = fopen("mybook.xml", "r");
if(fp == NULL)
{
printf("Failed to open file\n");
return 1;
}
/* Obtain the file size. */
fseek (fp, 0, SEEK_END);
size_t file_size = ftell(fp);
rewind(fp);
XML_Parser parser = XML_ParserCreate(NULL);
int done;
memset(buff, 0, sizeof(buff));
do
{
size_t len = fread(buff, 1, file_size, fp);
done = len < sizeof(buff);
if(XML_Parse(parser, buff, len, done) == XML_STATUS_ERROR)
{
printf("%s at line %d\n", XML_ErrorString(XML_GetErrorCode(parser)),
XML_GetCurrentLineNumber(parser));
return 1;
}
}
while(!done);
fclose(fp);
XML_ParserFree(parser);
return 0;
}
It took a while to wrap my head around XML parsing (though I do it in Perl, not C). Basically, you register callback functions. The parser will ping your callback for each node and pass in a data structure containing all kinds of juicy bits (like plaintext, any attributes, children nodes, etc). You have to maintain some kind of state information--like a hash tree you plug stuff into, or a string that contains all the guts, but none of the XML.
Just remember that XML is not linear and it doesn't make much sense to parse it like a long hunk of text. Instead, you parse it like a tree. Good luck.