Search code examples
asp.netazureazure-application-insights

Fixed-Rate sampling in Application Insights poor accuracy


For analytics purposes, I would need to analyze the number of requests that are made to my server and I am using Application Insights. Since the data is massive, to limit costs I decided to apply a Fixed-Rate Sampling of 5%. After doing some tests I got the following results:

  • 800 real requests -> 32 samples (4%)
  • 200 real requests -> 14 samples (7%)

I expected a sampling rate of 5% but my empirical data shows poor accuracy. Is this behavior normal? Why does this happen? Can I consider the data to be representative in case of high requests volume?


Solution

  • Sampling in Application Insights is done using randomized approach:

    • When a new span starts an operation id is randomly generated
    • Operation id is hashed and converted to number in [0..1] range
    • If this number is less or equal than configured sampling ratio then telemetry is sampled in, discarded otherwise

    5% does not mean that every 20th document is collected. So, on small numbers you indeed can see variance away from configured 5%. On bigger numbers this should get closer and closer to 5% (as with any random distribution).

    Here is a sample code which illustrates it (this is not how Application Insights implemented it, this just illustrates how random-based sampling works):

        static void Main(string[] args)
        {
            var random = new Random();
    
            const int numRuns = 50;
            const int numPerRun = 200;
            int total = 0;
            for (int j = 0; j < numRuns; j++)
            {
                int totalPerRun = 0;
                for (int i = 0; i < numPerRun; i++)
                {
                    var next = random.Next(100);
                    if (next < 5)
                    {
                        ++totalPerRun;
                        ++total;
                    }
                }
    
                Console.WriteLine($"Run #{j}: {totalPerRun} out of {numPerRun}");
            }
    
            Console.WriteLine($"Total: {total} out of {numRuns * numPerRun}");
        }
    

    Here is the output. Note, that out of 10,000 we got almost 5%. But on individual runs it varied way beyond 5%:

    Run #0: 12 out of 200
    Run #1: 6 out of 200
    Run #2: 12 out of 200
    Run #3: 7 out of 200
    Run #4: 8 out of 200
    Run #5: 8 out of 200
    Run #6: 6 out of 200
    Run #7: 12 out of 200
    Run #8: 7 out of 200
    Run #9: 11 out of 200
    Run #10: 18 out of 200
    Run #11: 11 out of 200
    Run #12: 9 out of 200
    Run #13: 11 out of 200
    Run #14: 11 out of 200
    Run #15: 10 out of 200
    Run #16: 13 out of 200
    Run #17: 7 out of 200
    Run #18: 9 out of 200
    Run #19: 15 out of 200
    Run #20: 4 out of 200
    Run #21: 6 out of 200
    Run #22: 9 out of 200
    Run #23: 8 out of 200
    Run #24: 10 out of 200
    Run #25: 10 out of 200
    Run #26: 6 out of 200
    Run #27: 13 out of 200
    Run #28: 9 out of 200
    Run #29: 5 out of 200
    Run #30: 15 out of 200
    Run #31: 9 out of 200
    Run #32: 9 out of 200
    Run #33: 12 out of 200
    Run #34: 10 out of 200
    Run #35: 8 out of 200
    Run #36: 13 out of 200
    Run #37: 8 out of 200
    Run #38: 10 out of 200
    Run #39: 9 out of 200
    Run #40: 8 out of 200
    Run #41: 6 out of 200
    Run #42: 10 out of 200
    Run #43: 11 out of 200
    Run #44: 14 out of 200
    Run #45: 10 out of 200
    Run #46: 7 out of 200
    Run #47: 13 out of 200
    Run #48: 13 out of 200
    Run #49: 9 out of 200
    Total: 487 out of 10000