Search code examples
c++qtqvariantstdhash

How to hash QVariant?


I need to use QList<QVariant> as a key to std::unordered_map. The purpose of this is to optimize searching over a table of data by making index over the unique key columns.

So I made this code. It's not complete, but lists some basic data types that occur in the table key columns:

#include <unordered_map>
#include <string>
//std::hash
#include <functional>
//std::size_t
#include <cstddef>
// Hashing method for QVariantList
namespace std {
    template <>
    struct hash<QList<QVariant>>
    {
        std::size_t operator()(const QList<QVariant>& k) const
        {
            using std::size_t;
            using std::hash;
            using std::string;
            size_t hash_num = 0;
            Q_FOREACH(var, k) {
                // Make hash of the primitive value of the QVariant
                switch(var.type()) {
                    case QVariant::String : {
                        hash_num = hash_num^hash<string>(var.toString().toStdString());
                        break;
                    }
                    case QVariant::Char :
                    case QVariant::ULongLong :
                    case QVariant::UInt :
                    case QVariant::LongLong :
                    case QVariant::Int : {
                        hash_num = hash_num^hash<long long>(var.toLongLong());
                        break;
                    }
                    case QVariant::Double : {
                        hash_num = hash_num^hash<double>(var.toDouble());
                        break;
                    }
                }
            }
            return hash_num;
        }
    };
}

Obviously, I don't like the whole switch thing. It's pretty long and ugly code and does only account for the basic types. I'd rather make hash of the memory data allocated for the QVariant's internal data. Or, even better - use some Qt's hashing method.

Is there a semi-reliable* way to hash any QVariant without converting it to primitive type?

*I understand that complex objects might be hiding behind QVariant, but cases where this would lead to collision are rare enough so I don't have to care.


Solution

  • Get yourself a QByteArray + QBuffer + QDataStream to basically serialize QVariants to the QByteArray.

    Then simply hash the raw bytes in the byte array. Qt already implements a qHash function for QByteArray so you are all set.

    You can maximize efficiency by reusing the same QByteArray with enough preallocated bytes to avoid reallocations. You can wrap the whole thing in a VariantHasher class, and simply seek(0) for the buffer before each new hashing and only hash the pos() number of bytes instead the whole thing.

    class QVariantHasher {
      public:
        QVariantHasher() : buff(&bb), ds(&buff) {
          bb.reserve(1000);
          buff.open(QIODevice::WriteOnly);
        }
        uint hash(const QVariant & v) {
          buff.seek(0);
          ds << v;
          return qHashBits(bb.constData(), buff.pos());
        }
      private:
        QByteArray bb;
        QBuffer buff;
        QDataStream ds;
    };
    

    It is pretty fast as mentioned in the comments, and it has the advantage of working with every type that supports QDataStream serialization. For custom types you will only have to implement the serialization, no need to make and maintain a giant switch. If you already have the switch version implemented a comparison would be interesting to make. The switch itself is a lot of branching, while reusing the same byte array is very cache friendly, especially if you don't use to many bytes, that is, you are not hashing variants that contain very long strings or arrays.

    Also, it is better than semi-reliable, as the hashing includes the variant type as well, so even in the cases the actual data might be binary identical, for example two bytes with values 255 vs a short with value 65535, the hash will incorporate the type so the values would not collide.