Search code examples
c++algorithmmathdata-structureshamming-distance

Data structure for Hamming cube


I have a Hamming cube, of general dimension, but in practice, usually, the dimension ranges from 3 to 6.

The search algorithm is:

Input: any vertex, `v`.
Find all vertices that lie in Hamming distance 1 from `v`.
Find all vertices that lie in Hamming distance 2 from `v`.
...

I do not know in advance how far away from v will I need to go. I might stop at distance 1 for example.

For instance, given this cube:

enter image description here

and v = 100, I would need to the vertices at Hamming distance 1, which are 000, 101, 110 (at any order). Then, I might need to get those in distance 2, namely 111, 001, 010. In the unlikely event of needing the vertices at distance 3 too, I Would get 011 as well.

A vertex of the cube may contain IDs (integers).

Which would be an appropriate Data structure to store this cube and efficiently search it? I am not really interested in other operations.

I thought about sorting all the bit sequences somehow, so that I can easily access them, but didn't get anything to work.


My approach so far:

Use a hashtable (specifically std::unordered_map), where the keys are the vertices and the values are the IDs.

Given a vertex v, generate all sequences of bits within Hamming distance t (i.e. 1, 2, ...).

However, this requires me to call a function every time a vertex varrives (which often happens). I have a function to achieve this, based on this.


Solution

  • I'm rusty with C++, so I'll keep this abstract.

    The neighbors of a given point of your Hamming cube are easily computable. Given a vertex's bit sequence, flip each bit individually.

    You could precompute that, though. You could cache the results of your neighbors() function, or you could save them to an array. Each vertex would have its own neighbors, so you have one array for each vertex. That gives you, essentially, your adjacency list.

    With that adjacency list, you can search your Hamming cube using depth-limited search, a variant of DFS (or BFS, I guess—but space complexity is worse) that only goes k units deep.


    Your data structure is a good choice, but consider that your vertices are binary strings, so they cover all points from 0 to 2^n - 1. You might as well just use an array—lookup will still be O(1), and it'll be more compact because there aren't unused buckets sitting around.