Search code examples
google-bigqueryaggregate-functions

Inclusion of nulls with ANY_VALUE in BigQuery


I have a 'vendors' table that looks like this...

**company itemKey itemPriceA itemPriceB**
companyA, 203913, 20, 10
companyA, 203914, 20, 20
companyA, 203915, 25, 5
companyA, 203916, 10, 10

It has potentially millions of rows per company and I want to query it to bring back a representative delta between itemPriceA and itemPriceB for each company. I don't care which delta I bring back as long as it isn't zero/null (like row 2 or 4), so I was using ANY_VALUE like this...

SELECT company
, ANY_VALUE(CASE WHEN (itemPriceA-itemPriceB)=0 THEN null ELSE (itemPriceA-itemPriceB) END)
FROM vendors
GROUP BY 1

It seems to be working but I notice 2 sentences that seem contradictory from Google's documentation...

"Returns NULL when expression is NULL for all rows in the group. ANY_VALUE behaves as if RESPECT NULLS is specified; rows for which expression is NULL are considered and may be selected."

If ANY_VALUE returns null "when expression is NULL for all rows in the group" it should NEVER return null for companyA right (since only 2 of 4 rows are null)? But the second sentence sounds like it will indeed include the null rows.

P.s. you may be wondering why I don't simply add a WHERE clause saying "WHERE itemPriceA-itemPriceB>0" but in the event that a company has ONLY matching prices, I still want the company to be returned in my results.


Solution

  • This is an explanation about how “any_value works with null values”.

    With any_value always return the first value, if there is a value different from null.

    SELECT ANY_VALUE(fruit) as any_value
    FROM UNNEST([null, "banana",null,null]) as fruit;
    

    Return null if all rows have null values. Refers at this sentence

    “Returns NULL when expression is NULL for all rows in the group”

    SELECT ANY_VALUE(fruit) as any_value
    FROM UNNEST([null, null, null]) as fruit
     
    

    Return null if one value is null and you specified in the where clause. Refers to these sentences

    “ANY_VALUE behaves as if RESPECT NULLS is specified; rows for which expression is NULL are considered and may be selected.”

    SELECT ANY_VALUE(fruit) as any_value
    FROM UNNEST(["apple", "banana", null]) as fruit
    where fruit is null
    

    Always depends which filter you are using and the field inside the any_value.

    You can see this example, return two rows that are different from 0.

    SELECT ANY_VALUE(e).company, (itemPriceA-itemPriceB) as value
    FROM `vendor` e
    where (itemPriceA-itemPriceB)!=0
    group by e.company