Search code examples
sqlpostgresqlgreatest-n-per-groupsql-limit

Using a top 10 query to then search all records associated with them


I'm not super experienced with sql in general, and I'm trying to accomplish a pretty specific task- I want to first run a query to get the ID's of all my units with the top number of hits, and then from that run again to get the messages and counts of all the types of hits for those IDs in a specific time period.For the first query, I have this:

SELECT entity, count(entity) as Count
from plugin_status_alerts
where entered BETWEEN now() - INTERVAL '14 days' AND now()
group by entity
order by count(entity) DESC
limit 10

which results in this return:

"38792";3
"39416";2
"37796";2
"39145";2
"37713";2
"37360";2
"37724";2
"39152";2
"39937";2
"39667";2

The idea is to then use that result set to then run another query that orders by entity and status_code. I tried something like this:

SELECT status_code, entity, COUNT(status_code) statusCount
FROM plugin_status_alerts
where updated BETWEEN now() - INTERVAL '14 days' AND now() AND entity IN 
(SELECT id.entity, count(id.entity) as Count
from plugin_status_alerts id
where id.updated BETWEEN now() - INTERVAL '14 days' AND now()
group by id.entity
order by count(id.entity) DESC
limit 10
)
GROUP BY status_code, entity

but I get the error

ERROR: subquery has too many columns

I'm not sure if this is the route I should be going, or if maybe I should be trying a self join- either way not sure how to correct for whats happening now.


Solution

  • Use a JOIN instead of IN (subquery). That's typically faster, and you can use additional values from the subquery if you need to (like the total count per entity):

    SELECT entity, status_code, count(*) AS status_ct
    FROM  (
       SELECT entity  -- not adding count since you don't use it, but you could
       FROM   plugin_status_alerts
       WHERE  entered BETWEEN now() - interval '14 days' AND now()
       GROUP  BY entitiy
       ORDER  BY count(*) DESC, entitiy  -- as tie breaker to get stable result
       LIMIT  10
       ) sub
    JOIN   plugin_status_alerts USING (entity)
    WHERE  updated BETWEEN now() - interval '14 days' AND now()
    GROUP  BY 1, 2;
    

    Notes

    • If you don't have future entries by design, you can simplify:

      WHERE  entered > now() - interval '14 days'
      
    • Since the subquery only returns a single column (entity), which is merged with the USING clause, column names are unambiguous and we don't need table qualification here.

    • LIMIT 10 after you sort by the count is likely to be ambiguous. Multiple rows can tie for the 10th row. Without additional items in ORDER BY, Postgres returns arbitrary picks, which may or may not be fine. But the result of the query can change between calls without any changes to the underlying data. Typically, that's not desirable and you should add columns or expressions to the list to break ties.

    • count(*) is a bit faster than count(status_code) and doing the same - unless status_code can be null, in which case you would get 0 as count for this row (count() never returns null) instead of the actual row count, which is either useless or actively wrong. Use count(*) either way here.

    • GROUP BY 1, 2 is just syntactical shorthand. Details: