I'm using pugixml's xpath functions to find certain nodes within a html document (downloaded through curl).
I am using:
pugi::xml_document doc;
doc.load_buffer(htmlcontent.c_str(), htmlcontent.size());
pugi::xpath_node example= doc.select_single_node("//h2[@class='tv_header']");
std::cout << example.node();
which returns 0 nodes. I know that this node exists in the document. I've put just that node within a string and it finds the node successfully. Why is the node not found within the document? Is there some issue with encoding of the html document?
It is likely that the parsing of your document stops before encountering the node.
HTML documents generally can not be parsed by XML parsers; unless your document is a valid XHTML document you need to use an HTML parser.
To verify this, just look at the result object that's returned by load_buffer - i.e.
pugi::xml_parse_result res = doc.load_buffer(htmlcontent.c_str(), htmlcontent.size());
std::cout << "Parsing result: " << res.description() << std::endl;
if (!res) std::cout << "Parsing stopped at offset " << res.offset << std::endl;