Search code examples
newrelicapdex

How arrive at an Apdex Threshold value based on the SLA?


We have a REST API available. For each of the endpoints that this API offers, we have a defined SLA based on the internal testing. New Relic provide an option to define the Apdex T score on a per application basis. Considering a scenario as follows:

  • Endpoint A: SLA is 200ms
  • Endpoint B: SLA is 800ms
  • Average SLA: 500ms

    Case 1: Consider the average SLA for the Apdex Threshold value The problem with this approach is that even though my endpoint A is expected to completed in 200ms, it wouldn't be flagged even if the endpoint takes twice the time defined in the SLA since it would still be less than the average value. Vice-versa would be the case for endpoint B, where it would be flagged even if it was below 800ms.

    Case 2: Consider the max SLA(800ms) of all the endpoints as the Apdex T value Again the problem, here would be with the endpoint A. Any delay in response from this endpoint wouldn't be flagged even if take 4 times the actual expected time.

So, how do we arrive at an Apdex Threshold value in such scenarios? I went through the following article from New relic: LINK. This makes sense when we look the service as a whole, but not when we look at each of the endpoints.


Solution

  • Are you sure you want to set Apdex based on your SLA?

    I would suggest that typical performance of the application is the better metric to be looking at. Lets say if over the last 7 days your application has an average performance. However in the "How to set an Apdex T", the article suggests using a percentile for your typical performance.

    So if you get the 90th Percentile, it should result typically in a near 0.95 Apdex Score. Obviously Apdex of 1 is useless as you're not holding your account to near enough account. So I would individually ask Insights

    select percentile(duration, 90) from Transaction where appName="AppA" since 7 days ago

    select percentile(duration, 90) from Transaction where appName="AppB" since 7 days ago

    This will give you a response time that 90% of your customers are getting better than. So should be a good rough guide as to your Apdex T value.

    If however your goal is that on App A where SLA is 200ms and ANY transaction over that should be 0 points towards the Apdex score. Then quite simply your Apdex T should be 50ms. Because anything faster than 50ms gets 1 point, anything between Apdex T and 4 x Apdex T gets 0.5 points, but at least is still scoring. Anything slower than 4 x Apdex T (in this scenario 200ms) gets 0 points towards Apdex. So that would give you transactions marked as Frustrated for Apdex if they violate the SLA.

    Apdex is a bit of an art but you can definitely get to where you need with either of the above. I hope I covered off the two scenarios I see as being likely in this case.