The schema for my table is as follows
FIELD NAME TYPE
test_id string
samples repeated
error1 float
error2 float
error2_is_valid boolean
I'd like to calculate the mean average error (mae) but only taking valid samples into account. So I need the following (pseudocode)
error2_mae = 0
error2_valid_samples = 0
for sample in samples:
if sample.error2_is_valid:
error2_mae += sample.error2
error2_valid_samples += 1
error2_mae /= error2_valid_samples
Here's the sql statement I'm using but I'm missing said if condition
SELECT test_id, AVG(ABS(samples.error2)) AS mae
FROM <table-name> as t, UNNEST(t.samples) as samples
GROUP BY test_id
Edit: I tried using the WHERE
clause but that breaks down when there are two different errors, where only error2 needs the conditional includes, error1 should take all into account...
SELECT test_id, AVG(ABS(samples.error1) AS mae1, AVG(ABS(samples.error2)) AS mae2
FROM <table-name> as t, UNNEST(t.samples) as samples
WHERE samples.error2_is_valid
GROUP BY test_id
Thank you
Just add the condition on the struct field to the where
clause:
SELECT t.test_id, AVG(ABS(s.error)) AS mae
FROM <table-name> as t
CROSS JOIN UNNEST(t.samples) as s
WHERE s.is_valid
GROUP BY t.test_id
This will filter out tests whose all results are invalid. If you still want these tests in the resultset (say with a null
mae), then we can do conditional aggregation instead:
SELECT t.test_id,
AVG(CASE WHEN s.is_valid THEN ABS(s.error) END) AS mae
FROM <table-name> as t
CROSS JOIN UNNEST(t.samples) as s
GROUP BY t.test_id