Search code examples
prometheushistogram

promethus: native histograms


Here, it mentions native histograms in prometheus:

NOTE: Beginning with Prometheus v2.40, there is experimental support for native histograms. A native histogram requires only one time series, which includes a dynamic number of buckets in addition to the sum and count of observations. Native histograms allow much higher resolution at a fraction of the cost. Detailed documentation will follow once native histograms are closer to becoming a stable feature.

I am not able to understand what this paragraph means, like how dynamic number of buckets play a role here, higher resolution.

Suppose I myself want to impelment a native histogram, then how would it differ from classical histogram implementation?


Solution

  • First, let's have the definitions of Resolution and Zero-bucket-width:

    Resolution:

    The resolution defines the number of buckets per power of ten. It is configurable (from 1 up to some reasonable upper limit)...Note, however, that Histograms with different resolutions are only mergeable if the higher resolution is a multiple of the lower resolution... For the final implementation, we should therefore consider restricting the resolution to powers of two so that all possible resolutions will be mergeable ... The spacing of buckets is logarithmic, with bucket boundaries guaranteed at powers of 10. A duration Histogram with a resolution of 1 has boundaries like …, 10ms, 100ms, 1s, 10s, … With a resolution of 3, the boundaries would be …, 10ms, 21.5ms, 46.4ms, 100ms, 215ms, 464ms, 1s.

    Zero-bucket-width:

    The zero-bucket-width defines a special bucket around zero. All observations with an absolute value below or equal the zero-bucket-width are counted in this special bucket. This avoids an explosion of the bucket counts for observations very close to zero and allows observations of the value zero in the first place. The precise value for the zero-bucket-width is arbitrary

    Native histograms (a.k.a Sparse High-Resolution Histogram) are not fully implemented, but you need to take a look at its design doc. It's also useful to look at its PR. They enjoy fewer time series, higher resolution, aggregatable schemas, etc.

    The problem with the classic Prometheus histograms is well-described as follows:

    They require the user to pick a suitable bucket layout during instrumentation, which is made even harder by the fact that buckets are expensive. (Every bucket creates a separate time series on the Prometheus servers and a separate line in the text-based exposition format.) Many users use histograms to calculate φ-quantiles (e.g. 99th percentile, median, …). The error of the quantile estimation depends on the width of the bucket the quantile value falls into. But given the cost of buckets, most Prometheus Histograms have very wide buckets or only cover very specific ranges with narrow buckets.

    To prevent Histograms from taking most of the cardinality budget, users tend to have very few buckets per Histogram (3–10), leading to huge errors in quantile estimation (unless the quantile value falls into a carefully selected value range with a few narrow buckets).

    So far, you have a sense of what is wrong with the current Prometheus histograms. Now, let's take a glimpse at the solution (native histograms):

    A new kind of histogram with a regular logarithmic bucket layout of a relatively high resolution and a fundamentally infinite number of buckets...

    The bucket layout of the Sparse Histogram is defined by only two parameters, a resolution and a zero-bucket-width.

    If you wanna run this feature with your Prometheus service, you need to run

    ./prometheus --enable-feature=native-histograms
    

    Furthermore, you can take a look at Prometheus feature flags. Prometheus Golang client library also supports native histograms, and therefore, all you need is to define NativeHistogramBucketFactor.

    For more info, you can watch this YouTube video from its contributor.