I'm having trouble to understand how to use the AVG function in the following scenario,
I have a table called requests
, it has columns app_name
(STRING), operation_status
(INTEGER status codes like 200, 500, etc.), timestamp
(INTEGER, this is a milliseconds timestamp), correlated_id
(STRING, this is an UUID).
So for example if I have the following records:
INSERT INTO `requests`
(app_name, operation_status, timestamp, correlated_id)
VALUES
("APP_1", 200, 1714207107972, '82156a64-e43f-467f-894d-bf28dc007e2b'),
("APP_1", 500, 1714207107972, '51345b32-a21d-215c-432a-bc13dc115e1c'),
("APP_2", 200, 1714207107972, '41123n43-b15d-134a-673e-ad13ab104a1c'),
("APP_2", 500, 1714207107972, '63246a54-a32g-324c-673e-ad13ab104a1c'),
("APP_3", 200, 1714207107972, '51135b32-b13a-213f-532f-zc09dg213k0h');
I want a query that tells me the Average number of 200 requests that happen per application, so for this I was trying the following query:
SELECT
app_name,
COUNTIF(operation_status = 200) AS successful_ops,
COUNTIF(operation_status = 500) AS failed_ops,
AVG(CASE WHEN operation_status = 200 THEN operation_status END) AS avg_status_200, -- this is not working
(COUNTIF(operation_status=200)/COUNT(operation_status)) AS average_manual -- this is giving me the correct result but I think AVG function should be able to give me this too
FROM
`requests`
GROUP BY
app_name;
So for example the correct value for the APP_1 would be 0.5, cause it has 2 requests but only 1 is 200, so 1/2 = 0.5, this is the result I get with "average_manual" in the query, but I think that AVG
function should also be able to give me this result but I can not figure out how is the proper way, appreciate any help
The official definition of a calculation of average is defined by:
The sum of all observations, divided by the total number of observations
So basically, in your example, what you are doing is:
What you are trying to calculate, is the proportion of requests that resulted with a response code of 200, out of all the requests for that app.
That is why using the AVG method does not work for you - since you are using the response code as a number in the calculation, rather than an "indicator".
There are several ways to achieve your required result, one of them being the example that you gave, written as "average manual".
Another option is to use the AVG function in combination with a CASE statement, like so:
SELECT
app_name,
COUNTIF(operation_status = 200) as successful_ops,
COUNTIF(operation_status = 500) as failed_ops,
AVG(CASE operation_status WHEN 200 THEN 1 ELSE 0 END) successful_ops_proportion_first_option,
AVG(CASE WHEN operation_status = 200 THEN 1 ELSE 0 END) successful_ops_proportion_second_option,
COUNTIF(operation_status = 200) / COUNT(operation_status) as average_manual
FROM
`requests`
GROUP BY
app_name
Which will result in the following table (note how the 3 last columns are all identical, BigQuery has 2 types of CASE syntax, and we have the manual proportion calculation after):