Search code examples
algorithmdata-structuresstreamingfrequency-analysis

Difference between output AMS Sketch and Count Sketch algorithm


I am trying to understand the differences between the AMS sketch and the Count Sketch algorithms. The way i understand it is that both of their goals/outputs are to return a sketch, that is a frequency vector. That contains the frequencies of elements in a passing steam. What is the difference between the two?

Intuitively it would make sense that the AMS algorithm only indicates whether or not an elment has passed by, and doesn't actually count how many times. Although I am not sure if this is correct.

Furthermore i am not sure why the need for a sketch is there in the first place. Why not just have a normal dictionary that increments a counter each time an element hashes to some value in the dictionary?

Hope that made sense. Thanks


Solution

  • Both are attempts to address the problems involved keeping counts of more elements than you actually can put in your dictionary. You cannot possibly do this, but you can solve related problems with some error rate.

    The AMS sketch tries to solve the problem of estimating various aggregate statistics correctly. Such as the sum of the squares of the frequencies.

    The count sketch tries to solve the problem of estimating individual counts correctly. So at any point you can take any particular value that you might have seen, and produce an estimate of how many times you have seen it. This estimate is unbiased, equally likely to be high or low.

    The count-min sketch is like the count sketch except that it provides an upper bound of how many times you have seen it. (The "min" refers to a min that you take inside of the algorithm.)