My IIS server has always experiencing strange, intermittent 20-30s lag spikes on fairly standard API calls in my C# app. I'll add the server details below in case anyone is interested
In order to see if my app was at fault, I thought it wise to add x-ray for monitoring and even added my own subsegmentations to see if specific areas of code were to blame; what I've learned is according to my data, each of these areas runs fine and is not the cause of latency.
Whenever I analyze calls that executed longer, or even in a recent case of 3.48s call that normally should take 300ms (see screenshot below), the evidence is always the same - the first row in x-ray has some large number (3s to 30s) and the details below when I expand are always in the range of milliseconds, and do not add up to the 3.6s, or 30s, etc.
I wanted to ask how to interpret this - whenever I see the overall call take that long, but the expanded trace details in milliseconds, would it simply mean my app is fine, but the actual overall call (network latency, worker processes, web server, etc.) are to blame?
I'm just trying to understand where I need to start looking if that makes sense.
IIS Server The server is an EC2 T2-medium on AWS, and traffic is negligible (perhaps 100 API calls a day). I have a single server due to the low load. My IIS maximum worker processes on the app pool is set to 7, queue length 1000, start mode AlwaysRunning. I've not done much by way of fine-tuning the server.
Thank you so much for your time and any guidance.
You can look at the raw trace data in the console to see the start and end time of each of the subsegments. This can guide you to where you should focus. If there is a large gap in the start time of the segment and the first subsegment, then you should focus on the start of your code represented by the segment.
Click on a trace in the AWS X-Ray console, then click "Raw Data" in the top right.
You should see the "start_time" and "end_time" of the segments and subsegments.