I'm implementing a metrics system for a backend API at scale and am running into a dilemma: using statsd
, the application itself is logging request metrics on a per endpoint basis, but the CPU metrics are at the global server level. Currently each server has 10 threads, meaning 10 requests can be processed at once (yeah, yeah its actually serial).
For example, if we have two endpoints, /user
and /item
, the statsd
implementation is differentiating statistics (DB/Redis I/O, etc.) per endpoint. However, say we are looking at linux-metrics
every N seconds
, those statistics do not separate endpoints, inherently.
I believe that it would be possible, assuming that your polling time ("N seconds
") is small enough and that you have enough diversity within your requests, to decompose the global system metrics to create an estimate at the endpoint level.
Image a scenario like this:
note: we'll say a
represents a GET to /user
and b
represents a GET to /item
|------|------|------|------|------|------|------|------|------|------|
| t1 | t2 | t3 | t4 | t5 | t6 | t7 | t8 | t9 | t10 |
|------|------|------|------|------|------|------|------|------|------|
| a | b | b | a | a | b | b | a | b | b |
| b | a | b | | b | a | b | | b | |
| a | b | b | | a | a | b | | a | |
| a | | b | | b | a | a | | a | |
| a | | b | | a | a | b | | | |
| | | | | a | | a | | | |
|------|------|------|------|------|------|------|------|------|------|
At every timestep, t
(i.e. t1
, t2
, etc.), we also take a snapshot of our system metrics. I feel like there should be a way (possibly through a sort of signal decomposition) to estimate the avg load each a
/b
request takes. Now, in practice I have ~20 routes so it would be far more difficult to get an accurate estimate. But like I said before, provided your requests have enough diversity (but not too much) so that they overlap in certain places like above, it should be at the very least possible to get a rough estimate.
I have to imagine that there is some name for this kind of thing or at the very least some research or naive implementations of this method. In practice, are there any methods that can achieve these kinds of results?
Note: it may be more difficult when considering that requests may bleed over these timesteps, but almost all requests take <250ms. Even if our system stats polling rate is every 5 seconds (which is aggressive), this shouldn't really cause problems. It is also safe to assume that we would be achieving at the very least 50 requests/second on each server, so sparsity of data shouldn't cause problems.
I believe the answer is doing a sum decomposition through linear equations. If we say that a system metric, for example the CPU, is a function CPU(t1)
, then it would just be a matter of solving the following set of equations for the posted example:
|------|------|------|------|------|------|------|------|------|------|
| t1 | t2 | t3 | t4 | t5 | t6 | t7 | t8 | t9 | t10 |
|------|------|------|------|------|------|------|------|------|------|
| a | b | b | a | a | b | b | a | b | b |
| b | a | b | | b | a | b | | b | |
| a | b | b | | a | a | b | | a | |
| a | | b | | b | a | a | | a | |
| a | | b | | a | a | b | | | |
| | | | | a | | a | | | |
|------|------|------|------|------|------|------|------|------|------|
4a + b = CPU(t1)
a + 2b = CPU(t2)
5b = CPU(t3)
a = CPU(t4)
3a + 3b = CPU(t5)
4a + b = CPU(t6)
2a + 4b = CPU(t7)
a = CPU(t8)
2a + 2b = CPU(t9)
b = CPU(t10)
Now, there will be more than one way to solve this equation (i.e. a = CPU(t8)
and a = CPU(t4)
), but if you took the average of a and b (AVG(a)
) from their corresponding solutions, you should get a pretty solid metric for this.