Search code examples
apache-storm

Differnces between __execute-count value and values gathered by the Metrics Reporting API v2


I have run a topology, and I used the Meter type in metric Reporting API v2. In the execute method I mark this metric. So it will mark an event whenever the execute method is called. But when I compare this value with the __execute-count, I see huge differences. Does anyone know why this happens?

These are the values from my log which are gathered at the same time:

9:v7 __execute-count {v0:v7=44500}
9:v7 tuple_inRate.count 664129

Update: When I use the mark method on the Meter metric, I will get different results in comparison with the Counter metric. But still, I do not understand why the values from the counter metric (tuple counter) are not the same as the __execute-count.


Solution

  • As given in this answer, Storms Internal Metrics are just estimated by a percentage of the real data flow. Initially, it uses 5% of incoming tuples to make those estimations. This may lead to inaccuracies for extreme high or low throughputs.

    EDIT: The documentation describes the following:

    In general all of these tuple count metrics are randomly sub-sampled unless otherwise stated. This means that the counts you see both on the UI and from the built in metrics are not necessarily exact. In fact by default we sample only 5% of the events and estimate the total number of events from that. The sampling percentage is configurable per topology through the topology.stats.sample.rate config. Setting it to 1.0 will make the counts exact, but be aware that the more events we sample the slower your topology will run (as the metrics are counted in the same code path as tuples are processed). This is why we have a 5% sample rate as the default.

    EDIT 2 In this post, there is more information about the estimation:

    The way it works is that if you choose a sampling rate of 0.05, it will pick a random element of the next 20 events in which to increase the count by 20. So if you have 20 tasks for that bolt, your stats could be off by +-380.

    By the way, execute_count is just an increasing number, while your tuple_inRate.count is a rate, isn`t it?