Search code examples
pythonalgorithmlanguage-agnosticsetbit-fields

the official name of this programming approach to compute the union and the intersection


I [surely re] invented this [wheel] when I wanted to compute the union and the intersection and diff of two sets (stored as lists) at the same time. Initial code (not the tightest):

dct = {}
for a in lst1:
  dct[a] = 1
for b in lst2:
  if b in dct:
    dct[b] -= 1
  else:
    dct[b] = -1

union = [k for k in dct]
inter = [k for k in dct if dct[k] == 0]
oneminustwo = [k for k in dct if dct[k] ==  1]
twominusone = [k for k in dct if dct[k] ==  -1]

Then I realized that I should use 00, 01, 10, and 11 instead of -1, 1, 0, ... So, a bit at position n indicates membership in set n.

This can be generalized to up to 32 sets using an 32-bit int, or to any number of sets using a bitarray, or a string. So, you pre-compute this dictionary once, and then use very fast O(n) queries to extract elements of interest. For instance, all 1s means intersection of all sets. All 0s is a special one - will not occur.

Anyhow, this is not to toot my own horn. This surely was invented before and has a name. What is it called? Is this approach used in databases somewhere?


Solution

  • Yes, it is sometimes used in databases, for example PostgreSQL. As mentions Wikipedia:

    Some database systems that do not offer persistent bitmap indexes use bitmaps internally to speed up query processing. For example, PostgreSQL versions 8.1 and later implement a "bitmap index scan" optimization to speed up arbitrarily complex logical operations between available indexes on a single table.

    (from http://en.wikipedia.org/wiki/Bitmap_index)