Search code examples
c++performancebit-manipulationhamming-distancestd-bitset

XOR bitset when 2D bitset is stored as 1D


To answer How to store binary data when you only care about speed?, I am trying to write some to do comparisons, so I want to use std::bitset. However, for fair comparison, I would like a 1D std::bitset to emulate a 2D.

So instead of having:

bitset<3> b1(string("010"));
bitset<3> b2(string("111"));

I would like to use:

bitset<2 * 3> b1(string("010111"));

to optimize data locality. However, now I am having problem with How should I store and compute Hamming distance between binary codes?, as seen in my minimal example:

#include <vector>
#include <iostream>
#include <random>
#include <cmath>
#include <numeric>
#include <bitset>

int main()
{
    const int N = 1000000;
    const int D = 100;
    unsigned int hamming_dist[N] = {0};
    std::bitset<D> q;
    for(int i = 0; i < D; ++i)
        q[i] = 1;

    std::bitset<N * D> v;
    for(int i = 0; i < N; ++i)
        for(int j = 0; j < D; ++j)
            v[j + i * D] = 1;


    for(int i = 0; i < N; ++i)
        hamming_dist[i] += (v[i * D] ^ q).count();

    std::cout << "hamming_distance = " << hamming_dist[0] << "\n";

    return 0;
}

The error:

Georgioss-MacBook-Pro:bit gsamaras$ g++ -Wall bitset.cpp -o bitset
bitset.cpp:24:32: error: invalid operands to binary expression ('reference' (aka
      '__bit_reference<std::__1::__bitset<1562500, 100000000> >') and
      'std::bitset<D>')
                hamming_dist[i] += (v[i * D] ^ q).count();
                                    ~~~~~~~~ ^ ~
/Library/Developer/CommandLineTools/usr/bin/../include/c++/v1/bitset:1096:1: note: 
      candidate template ignored: could not match 'bitset' against
      '__bit_reference'
operator^(const bitset<_Size>& __x, const bitset<_Size>& __y) _NOEXCEPT
^
1 error generated.

which occurs because it doesn't know when to stop! How I can tell it to stop after processing D bits?


I mean without using a 2D .


Solution

  • The problem is that v[i * D] accesses a single bit. In your conceptual model of a 2D bit array, it accesses the bit at row i and column 0.

    So v[i * D] is a bool and q is a std::bitset<D>, and the bitwise logical XOR operator (^) applied to those doesn't make sense.

    If v is meant to represent a sequence of binary vectors of size D, you should use a std::vector<std::bitset<D>> instead. Also, std::bitset<N>::set() sets all bits to 1.

    #include <vector>
    #include <iostream>
    #include <random>
    #include <cmath>
    #include <numeric>
    #include <bitset>
    
    int main()
    {
        const int N = 1000000;
        const int D = 100;
    
        std::vector<std::size_t> hamming_dist(N);
    
        std::bitset<D> q;
        q.set();
    
        std::vector<std::bitset<D>> v(N);
        for (int i = 0; i < N; ++i)
        {
            v[i].set();
        }
    
        for (int i = 0; i < N; ++i)
        {
            hamming_dist[i] = (v[i] ^ q).count();
        }
    
        std::cout << "hamming_distance = " << hamming_dist[0] << "\n";
    
        return 0;
    }