Search code examples
c++yamlyaml-cpp

How to tell if I've already processed a node


I am processing large files consisting of many redundant values (using YAML's anchors and references). The processing I do on each structure is expensive, and I would like to detect whether I'm looking at a reference to an anchor I've already processed. In Python (with python-yaml), I did this by simply building a dictionary keyed by id(node). Since yaml-cpp uses Node as a reference type, however, this does not seem to work here. Any suggestions?

This is similar to Retrieve anchor & alias string in yaml-cpp from document, but although that feature would be sufficient to solve my problem, it is not neccessary -- if I could get somehow a hash based on the internal address of the node, for example, that would be fine.

The expensive thing I'm doing is computing a hash of each node including itself and its children.


Solution

  • Here is a patch that seems to do what I need. Proceed with caution.

    diff -nr include/yaml-cpp/node/detail/node.h new/yaml-cpp-0.5.1/include/yaml-cpp/node/detail/node.h
    a13 1
    #include <boost/functional/hash.hpp>
    a24 1
                std::size_t identity_hash() const { return boost::hash<node_ref*>()(m_pRef.get()); }
    diff -nr /include/yaml-cpp/node/impl.h new/yaml-cpp-0.5.1/include/yaml-cpp/node/impl.h
    a175 5
        inline std::size_t Node::identity_hash() const
        {
        return m_pNode->identity_hash();
        }
    
    diff -nr include/yaml-cpp/node/node.h new/yaml-cpp-0.5.1/include/yaml-cpp/node/node.h
    a55 2
            std::size_t identity_hash() const;
    

    I can then use the below to make a unordered_map using YAML::Node as key.

    namespace std {
      template <>
      struct hash<YAML::Node> {
        size_t operator()(const YAML::Node& ss) const {
          return ss.identity_hash();
        }
      };
    }