Search code examples
htmlc++libxml2

How can you skip closing HTML tag with xmlTextReader?


I'm new to/learning libxml2. I need to take action whenever I find a specific HTML tag (in the simplified example below, that action is std::cout). My program below takes this action both when it encounters the opening and closing tags that match a specified string ("B"). However, I would like to only act upon finding the opening tag. How can this be done? I was unable to find/understand from the libxml2 documentation whether there is a way to distinguish between opening and closing tags, and I couldn't find a similar SO question.

The Code:

#include <iostream>
#include <libxml/xmlreader.h>

int main( int argc, char* argv[] )
{
  int ret;
  xmlTextReaderPtr r = xmlNewTextReaderFilename("foo.xml");

  if ( !r )
  {
    return -1;
  }

  ret = xmlTextReaderRead( r );

  while ( 1 == ret )
  {
    if ( std::string("B") == (const char*)xmlTextReaderConstName( r ) )
    {
      std::cout << "Found desired tag" << std::endl;
    }

    ret = xmlTextReaderRead( r );
  }

  if ( r )
  {
    xmlFreeTextReader( r );
  }

  return 0;
}

Compiled Like This:

>g++ --version
g++ (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7)
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

>g++ -lxml2 -I/usr/include/libxml2 main.cpp

Run With This XML:

<?xml version="1.0" encoding="ISO-8859-1"?>
<A version="02.00.00" priority="0" reliable="false">
 <B attr1="Type_B" attr2="usb" attr3="600">
  <C/>
  <D/>
 </B>
</A>

Results In This Output:

>./a.out 
Found desired tag
Found desired tag

Whereas I'd like "Found desired tag" to only be output once, i.e. only upon encountering the opening <B> HTML tag.


Solution

  • You can use xmlTextReaderNodeType(reader) to determine what "type" of node the reader is currently on, as defined here or in the xmlReaderTypes enum in xmlreader.h.

    In this case, you'll want to differentiate between XML_READER_TYPE_ELEMENT and XML_READER_TYPE_END_ELEMENT (ignoring the latter).