Right now I'm performing this xpath query using pugixml:
"//a/@href"
Using the following code:
std::vector<std::string> web::parser::query(std::string xpath)
{
pugi::xpath_node_set links = document.select_nodes(xpath.c_str());
std::cout << "OK" << std::endl;
std::vector<std::string> urls;
for (auto link : links)
urls.push_back(link.attribute().value());
return urls;
}
Observe that I need to specify that what I'm querying is an attribute because I call link.attribute().value())
instead of link.node().value())
.
Is there a way that I can make this query
function to work on both cases (attribute and PCData)?
After consulting the reference manual from pugixml, I saw that xpath_node
is an union of xml_node
and xml_attribute
.
This means that either one of them is null or both are. With that information, I can do this workaround:
std::vector<std::string> web::parser::query(std::string xpath)
{
pugi::xpath_node_set node_set = document.select_nodes(xpath.c_str());
std::vector<std::string> result;
for (auto xpath_node : node_set) {
if (xpath_node.attribute() != nullptr)
result.push_back(xpath_node.attribute().value());
else
result.push_back(xpath_node.node().child_value());
}
return result;
}
Which appears to be correct in my test cases.