Search code examples
sqlsqlitewhere-clausegreatest-n-per-grouparcgis

Using a WHERE clause subquery, select the greatest 1 per group with a tie-breaker


I have a RoadInsp table in a SQLite 3.38.2 database. I've put the data in a CTE for purpose of this question:

with roadinsp (objectid, asset_id, date_, condition) as (
values
(1, 1, '2016-04-01', 20),
(2, 1, '2019-03-01', 19),
(3, 1, '2022-01-01', 18),
  
(4, 2, '2016-04-01', 17),
(5, 2, '2022-01-01', 16),
  
(6, 3, '2022-03-01', 15),  --duplicate date
(7, 3, '2022-03-01', 14),  --duplicate date
(8, 3, '2019-01-01', 13)
)  
select * from roadinsp

objectid  asset_id      date_   condition
       1         1  2016-04-01         20
       2         1  2019-03-01         19
       3         1  2022-01-01         18

       4         2  2016-04-01         17
       5         2  2022-01-01         16

       6         3  2022-03-01         15
       7         3  2022-03-01         14
       8         3  2019-01-01         13

demo


I'm using GIS software that only lets me use SQL in a WHERE clause/SQL expression, not a full SELECT query.

I want to select the greatest 1 per group using a WHERE clause. In other words, for each ASSET_ID, I want to select the row that has the latest date.

I can achieve that using a WHERE clause expression like this:

date_ = 
  (select max(subq.date_) from roadinsp subq where roadinsp.asset_id = subq.asset_id)

objectid  asset_id      date_   condition
       3         1  2022-01-01         18
       5         2  2022-01-01         16
       6         3  2022-03-01         15
       7         3  2022-03-01         14

That works, but it selects two rows for asset #3, since there are two rows with the same date for that asset.

So I want to break the tie by selecting the row that has the highest condition value. It would look like this:

objectid  asset_id  date_       condition
       3         1  2022-01-01         18
       5         2  2022-01-01         16
       6         3  2022-03-01         15  --this row has a higher condition value than the other duplicate row.
                                           --so the other duplicate row was omitted.

I only ever want to select one row per asset. So if there are duplicate condition values too, then it doesn't matter what condition is selected, as long as only one row is selected.


Using a WHERE clause subquery, how can I select the greatest 1 per group, and break ties using the row that has the greatest condition?


Solution

  • I would use IN() with ORDER BY and LIMIT to check the primary key...

    SELECT * 
      FROM roadinsp r
     WHERE objectid IN (
             SELECT objectid
               FROM roadinsp
              WHERE asset_id = r.asset_id
           ORDER BY date_ DESC, condition DESC
              LIMIT 1
           )
    

    This is easily extended to n rows per asset. And additional conditions can easily be added to the ORDER BY.

    Demo