Search code examples
xpathpugixml

Extracting content from xpath_node_set generically in pugixml


Right now I'm performing this xpath query using pugixml:

"//a/@href"

Using the following code:

std::vector<std::string> web::parser::query(std::string xpath)
{
    pugi::xpath_node_set links = document.select_nodes(xpath.c_str());
    std::cout << "OK" << std::endl;

    std::vector<std::string> urls;
    for (auto link : links)
        urls.push_back(link.attribute().value());

    return urls;
}

Observe that I need to specify that what I'm querying is an attribute because I call link.attribute().value()) instead of link.node().value()).

Is there a way that I can make this query function to work on both cases (attribute and PCData)?


Solution

  • After consulting the reference manual from pugixml, I saw that xpath_node is an union of xml_node and xml_attribute.

    This means that either one of them is null or both are. With that information, I can do this workaround:

    std::vector<std::string> web::parser::query(std::string xpath)
    {
        pugi::xpath_node_set node_set = document.select_nodes(xpath.c_str());
    
        std::vector<std::string> result;
        for (auto xpath_node : node_set) {
            if (xpath_node.attribute() != nullptr)
                result.push_back(xpath_node.attribute().value());
            else
                result.push_back(xpath_node.node().child_value());
        }
    
        return result;
    }
    

    Which appears to be correct in my test cases.