I am processing large files consisting of many redundant values (using YAML's anchors and references). The processing I do on each structure is expensive, and I would like to detect whether I'm looking at a reference to an anchor I've already processed. In Python (with python-yaml), I did this by simply building a dictionary keyed by id(node). Since yaml-cpp uses Node as a reference type, however, this does not seem to work here. Any suggestions?
This is similar to Retrieve anchor & alias string in yaml-cpp from document, but although that feature would be sufficient to solve my problem, it is not neccessary -- if I could get somehow a hash based on the internal address of the node, for example, that would be fine.
The expensive thing I'm doing is computing a hash of each node including itself and its children.
Here is a patch that seems to do what I need. Proceed with caution.
diff -nr include/yaml-cpp/node/detail/node.h new/yaml-cpp-0.5.1/include/yaml-cpp/node/detail/node.h
a13 1
#include <boost/functional/hash.hpp>
a24 1
std::size_t identity_hash() const { return boost::hash<node_ref*>()(m_pRef.get()); }
diff -nr /include/yaml-cpp/node/impl.h new/yaml-cpp-0.5.1/include/yaml-cpp/node/impl.h
a175 5
inline std::size_t Node::identity_hash() const
{
return m_pNode->identity_hash();
}
diff -nr include/yaml-cpp/node/node.h new/yaml-cpp-0.5.1/include/yaml-cpp/node/node.h
a55 2
std::size_t identity_hash() const;
I can then use the below to make a unordered_map using YAML::Node as key.
namespace std {
template <>
struct hash<YAML::Node> {
size_t operator()(const YAML::Node& ss) const {
return ss.identity_hash();
}
};
}