Search code examples
sqlgoogle-bigqueryaggregate-functionsunnest

sql bigquery calculate mae with if condition on one specific column


The schema for my table is as follows

FIELD NAME                TYPE
test_id                   string
samples                   repeated
    error1                float
    error2                float
    error2_is_valid       boolean

I'd like to calculate the mean average error (mae) but only taking valid samples into account. So I need the following (pseudocode)

error2_mae = 0
error2_valid_samples = 0
for sample in samples:
    if sample.error2_is_valid:
        error2_mae += sample.error2
        error2_valid_samples += 1
error2_mae /= error2_valid_samples

Here's the sql statement I'm using but I'm missing said if condition

SELECT test_id, AVG(ABS(samples.error2)) AS mae
FROM <table-name> as t, UNNEST(t.samples) as samples
GROUP BY test_id 

Edit: I tried using the WHERE clause but that breaks down when there are two different errors, where only error2 needs the conditional includes, error1 should take all into account...

SELECT test_id, AVG(ABS(samples.error1) AS mae1, AVG(ABS(samples.error2)) AS mae2
FROM <table-name> as t, UNNEST(t.samples) as samples
WHERE samples.error2_is_valid
GROUP BY test_id 

Thank you


Solution

  • Just add the condition on the struct field to the where clause:

    SELECT t.test_id, AVG(ABS(s.error)) AS mae
    FROM <table-name> as t
    CROSS JOIN UNNEST(t.samples) as s
    WHERE s.is_valid
    GROUP BY t.test_id 
    

    This will filter out tests whose all results are invalid. If you still want these tests in the resultset (say with a null mae), then we can do conditional aggregation instead:

    SELECT t.test_id, 
        AVG(CASE WHEN s.is_valid THEN ABS(s.error) END) AS mae
    FROM <table-name> as t
    CROSS JOIN UNNEST(t.samples) as s
    GROUP BY t.test_id