Search code examples
prometheusmonitoringmetricsprometheus-node-exporterprom-client

what happens to application metrics e.g. CPU used by process that are not scraped in scrape interval prometheus


How does prometheus collect CPU information during intervals it dont scrape ? For e.g. i have my scrape_interval: 15s and a CPU spikes up to 90% during the 15 seconds that prometheus did not scrape ... Will i loose this important information being aggregated into average CPU used my process metrics rate(process_cpu_system_seconds_total[15s]) * 100 ?

I just need to understand that if scrape interval is n seconds, then the monitoring data for that n seconds is collected or is it just Lost ?


Solution

  • It's not "lost" but you're correct that it's never observed.

    Almost all measurements suffer from errors from this necessary approximation or down sampling.

    A consequence is that any measurement calculation is almost always only as good as the data that was captured.

    The problem is exacerbated when sampled data is further "sampled" to minimize storage for example retaining only daily data for periods beyond the last month.

    E.g. assume the following is a perfect record of some measurement:

    1,2,1,9,1,4,1,1,1,9

    If sampling retrieves every other value:

    1,1,1,1,1

    This is almost entirely unrepresentative of the data

    mean: 1 vs 3

    p90: 1 vs 9