I'm performing comparisons of objects based on the binary presence or absence of a set of features. These features can be represented by a bit string, such as this:
10011
This bitstring has the first, fourth and fifth feature.
I'm trying to calculate the similarity of a pair of bit strings as the number of features that both share in common. For a given set of bit strings I know that they'll all have the same length, but I don't know at compile time what that length will be.
For example, these two strings have two features in common, so I'd like the similarity function to return 2:
s(10011,10010) = 2
How do I efficiently represent and compare bit-strings in C++?
You can use the std::bitset
STL class.
They can be built from bit strings, ANDed, and count the number of 1:
#include <string>
#include <bitset>
int main()
{
std::bitset<5> option1(std::string("10011")), option2(std::string("10010"));
std::bitset<5> and_bit = option1 & option2; //bitset will have 1s only on common options
size_t s = and_bit.count (); //return the number of 1 in the bitfield
return 0;
}
EDIT
If number of bits is unknown at compile time, you can use boost::dynamic_bitset<>
:
boost::dynamic_bitset<> option(bit_string);
Other parts of example don't change, since boost::dynamic_bitset<>
share a common interface with std::bitset
.