I need to calculate the hash value for a large vector. Currently I do it with a for-loop and boost::hash_combine as shown below, but it is too slow - it takes close to 10ms for a 500000 size vector, but ideally I'd like to bring it down to 1ms or lesser. Is there a way to compute the hash faster(in one-shot perhaps?) for contents in a contiguous block of memory like a vector instead of having to parse through the entire vector with a for-loop?
#include <random>
#include <algorithm>
#include <functional> // bind
#include <iterator>
#include <iostream>
#include <vector>
#include <chrono>
#include <boost/functional/hash.hpp>
using namespace std;
int main ()
{
vector<double> myContainer(500000, 0.0);
uniform_real_distribution<double> unif(0.0,1.0);
mt19937 re(std::random_device{}());
auto generator = std::bind(unif, std::ref(re));
generate(begin(myContainer), end(myContainer), generator);
cout << "myContainer[0] = " << myContainer[0] << ", myContainer[L-1] = " << myContainer[myContainer.size()-1] << std::endl;
size_t hashValBoost(0); // type supported by boost::hash_combine
uint64_t startTime_us = std::chrono::duration_cast<std::chrono::microseconds>(std::chrono::high_resolution_clock::now().time_since_epoch()).count();
for(size_t i=0; i<myContainer.size(); ++i)
{
boost::hash_combine(hashValBoost, myContainer[i]);
}
uint64_t endTime_us = std::chrono::duration_cast<std::chrono::microseconds>(std::chrono::high_resolution_clock::now().time_since_epoch()).count();
cout << "ContainerSize = " << myContainer.size() <<"; Container hash = " << hashValBoost << ", TimeToComputeHash(ms) = " << (endTime_us - startTime_us)/1000.0 << "ms" << std::endl;
return 0;
}
EDIT: I am constrained to build without turning optimizations on and with C++11 or older compiler.
Run code here: https://wandbox.org/permlink/gxVxZ8QE53DhtZde
If your vector does not contain NaN
s or negative zeroes, you can leverage more heavily optimized byte array hashing algorithms:
std::size_t hash_bytes(std::span<const std::byte> sp) {
std::string_view sv(reinterpret_cast<const char*>(sp.data()), sp.size())
return std::hash<std::string_view>{}(sv);
// Or some other byte based hashing algorithm, like
// boost::hash_range is optimized for std::byte and unsigned char
return boost::hash_range(sp.begin(), sp.end());
}
// Usage: hashVal = hash_bytes(std::as_bytes(std::span(myContainer)))
Boost also has boost::hash_value(const std::vector<T>&)
or boost::hash_range(It, It)
, which internally does the hash_combine
loop. It's about ~4x slower, but will still work if you have negative zeroes/NaN
s.