I'm not super experienced with sql in general, and I'm trying to accomplish a pretty specific task- I want to first run a query to get the ID's of all my units with the top number of hits, and then from that run again to get the messages and counts of all the types of hits for those IDs in a specific time period.For the first query, I have this:
SELECT entity, count(entity) as Count
from plugin_status_alerts
where entered BETWEEN now() - INTERVAL '14 days' AND now()
group by entity
order by count(entity) DESC
limit 10
which results in this return:
"38792";3
"39416";2
"37796";2
"39145";2
"37713";2
"37360";2
"37724";2
"39152";2
"39937";2
"39667";2
The idea is to then use that result set to then run another query that orders by entity and status_code. I tried something like this:
SELECT status_code, entity, COUNT(status_code) statusCount
FROM plugin_status_alerts
where updated BETWEEN now() - INTERVAL '14 days' AND now() AND entity IN
(SELECT id.entity, count(id.entity) as Count
from plugin_status_alerts id
where id.updated BETWEEN now() - INTERVAL '14 days' AND now()
group by id.entity
order by count(id.entity) DESC
limit 10
)
GROUP BY status_code, entity
but I get the error
ERROR: subquery has too many columns
I'm not sure if this is the route I should be going, or if maybe I should be trying a self join- either way not sure how to correct for whats happening now.
Use a JOIN
instead of IN (subquery)
. That's typically faster, and you can use additional values from the subquery if you need to (like the total count per entity
):
SELECT entity, status_code, count(*) AS status_ct
FROM (
SELECT entity -- not adding count since you don't use it, but you could
FROM plugin_status_alerts
WHERE entered BETWEEN now() - interval '14 days' AND now()
GROUP BY entitiy
ORDER BY count(*) DESC, entitiy -- as tie breaker to get stable result
LIMIT 10
) sub
JOIN plugin_status_alerts USING (entity)
WHERE updated BETWEEN now() - interval '14 days' AND now()
GROUP BY 1, 2;
If you don't have future entries by design, you can simplify:
WHERE entered > now() - interval '14 days'
Since the subquery only returns a single column (entity
), which is merged with the USING
clause, column names are unambiguous and we don't need table qualification here.
LIMIT 10
after you sort by the count is likely to be ambiguous. Multiple rows can tie for the 10th row. Without additional items in ORDER BY
, Postgres returns arbitrary picks, which may or may not be fine. But the result of the query can change between calls without any changes to the underlying data. Typically, that's not desirable and you should add columns or expressions to the list to break ties.
count(*)
is a bit faster than count(status_code)
and doing the same - unless status_code
can be null, in which case you would get 0
as count for this row (count()
never returns null) instead of the actual row count, which is either useless or actively wrong. Use count(*)
either way here.
GROUP BY 1, 2
is just syntactical shorthand. Details: