Search code examples
sqlgoogle-bigquerydistinct

It seems my SELECT DISTINCT statement isn't processing when I view my results (BigQuery SQL)


(Using BigQuery SQL) I am trying to find the 10 regions that had the highest prices, and what date the price occurred on. The table has columns for Date, Region, AveragePrice, and other information not relevant to my query. The dataset has an entry for every Region and every day of the year.

My query is:

SELECT
    DISTINCT Region,
    max(AveragePrice) as highest_price,
    date
FROM avocado_data.avocado_prices as avocado_prices
GROUP BY
    avocado_prices.Region, date
ORDER BY highest_price DESC
LIMIT 10

The results I get are:

  1. SanFrancisco, 3.25, 2016-10-30
  2. Tampa, 3.17, 2017-04-16
  3. SanFrancisco, 3.12, 2016-11-06
  4. MiamiFtLauderdale, 3.05, 2017-03-12
  5. RaleighGreensboro, 3.04, 2017-08-27
  6. LasVegas, 3.03, 2016-10-02
  7. SanFrancisco, 3.0, 2017-08-27
  8. RaleighGreensboro, 3.0, 2017-10-01
  9. Jacksonville, 2.99, 2017-10-01
  10. SanFrancisco, 2.99, 2016-11-13

I would like to see 10 distinct Regions but SF and Raleigh are listed multiple times.

I have tried nesting this query inside the FROM statement of another query to just pull from this table, but I still get the same result.

I can also get the distinct Regions I want if I remove all references to Date in the query, but then I obviously can't see what date the highest price in each Region occurred:

SELECT
    DISTINCT Region,
    max(AveragePrice) as highest_price,
FROM avocado_data.avocado_prices as avocado_prices
GROUP BY
    avocado_prices.Region
ORDER BY highest_price DESC
LIMIT 10
  1. SanFrancisco, 3.25
  2. Tampa, 3.17
  3. MiamiFtLauderdale, 3.05
  4. RaleighGreensboro, 3.04
  5. LasVegas, 3.03
  6. Jacksonville, 2.99
  7. Seattle, 2.96
  8. Spokane, 2.95
  9. WestTexNewMexico, 2.93
  10. Orlando, 2.87

The table above is exactly what I'm looking for, along with the date of that price.


Solution

  • You can use Navigation function FIRST_VALUE for that

    Sample Data

    Region AveragePrice date
    SanFrancisco 3.25 2022-05-12
    SanFrancisco 4.25 2022-05-13
    SanFrancisco 2.25 2022-05-14
    Tampa 4.75 2022-05-12
    Tampa 3.75 2022-05-13
    Tampa 2.25 2022-05-14

    Query

    SELECT
      DISTINCT
      Region,
      MAX(AveragePrice) OVER (PARTITION BY Region) AS highest_price,
      FIRST_VALUE(date) OVER (PARTITION BY Region ORDER BY AveragePrice DESC) AS occured_on
    FROM table1
    

    Output:

    Region highest_price occured_on
    SanFrancisco 4.25 2022-05-13
    Tampa 4.75 2022-05-12