Search code examples
c++bit-manipulationsimilaritybit

How do I Quickly Compare Variable-Length Bit Strings in C++?


I'm performing comparisons of objects based on the binary presence or absence of a set of features. These features can be represented by a bit string, such as this:

10011

This bitstring has the first, fourth and fifth feature.

I'm trying to calculate the similarity of a pair of bit strings as the number of features that both share in common. For a given set of bit strings I know that they'll all have the same length, but I don't know at compile time what that length will be.

For example, these two strings have two features in common, so I'd like the similarity function to return 2:

s(10011,10010) = 2

How do I efficiently represent and compare bit-strings in C++?


Solution

  • You can use the std::bitset STL class.

    They can be built from bit strings, ANDed, and count the number of 1:

    #include <string>
    #include <bitset>
    
    int main()
    {
      std::bitset<5> option1(std::string("10011")), option2(std::string("10010"));
      std::bitset<5> and_bit = option1 & option2; //bitset will have 1s only on common options
      size_t s = and_bit.count ();                //return the number of 1 in the bitfield
      return 0;
    }
    

    EDIT

    If number of bits is unknown at compile time, you can use boost::dynamic_bitset<>:

    boost::dynamic_bitset<> option(bit_string);
    

    Other parts of example don't change, since boost::dynamic_bitset<> share a common interface with std::bitset.