Search code examples
algorithmcount-min-sketchstreaming-algorithm

How does count min sketch find the most frequent item in a stream? - Heavy Hitters


Count min sketch uses different hash functions to map elements in the stream to the hash function. How to map back from the sketch to find the most frequent item? Considering that enough elements have been passes(millions) and we don’t know the elements.


Solution

  • First of all the CMS in order to store data use pairwise independent hash functions to map elements in their structure (think of it as a table). Secondly, the reverse process is not supported as is, which is from the table to distinguish the distinct elements in the CMS.

    Using separate elements as queries you can retrieve their estimated count in the stream using the same family of hash functions (point query).

    In order to retrieve the most frequent item/items an additional data structure such as a heap should be used. Appart from the CMS papers, a quick and useful presentation over your question is found here: http://theory.stanford.edu/~tim/s15/l/l2.pdf