How to calculate frequencies of top K values in the stream?
Let's say we have a stream
CREATE STREAM stream (
value number
);
And we inserted ten rows
INSERT INTO stream (value) VALUES (1)
INSERT INTO stream (value) VALUES (1)
INSERT INTO stream (value) VALUES (1)
INSERT INTO stream (value) VALUES (2)
INSERT INTO stream (value) VALUES (2)
INSERT INTO stream (value) VALUES (3)
INSERT INTO stream (value) VALUES (4)
INSERT INTO stream (value) VALUES (5)
INSERT INTO stream (value) VALUES (6)
INSERT INTO stream (value) VALUES (7)
How can I get back the top 2 items and their frequencies?
value | frequency
-----------------
1 | 0.3
2 | 0.2
I suppose it should somehow use both Top K and the Count-min Sketch together?
You can use fss_agg for that:
CREATE CONTINUOUS VIEW v AS
SELECT fss_agg(x, 10) AS top_10_x FROM some_stream
This will keep track of the top 10
most frequently occurring values of x
. The weight given to each value can also be explicitly given:
CREATE CONTINUOUS VIEW v AS
SELECT fss_agg_weighted(x, 10, y) AS top_10_x FROM some_stream
The first version implicitly uses a weight of 1
.
There are various functions you can use to read the top-K values and their associated frequencies. For example, the following will return tuples of the form: (value, frequency)
:
SELECT fss_topk(top_10_x) FROM v