Search code examples
sqlquery-optimizationsubquery

Selecting COUNT from different criteria on a table


I have a table named 'jobs'. For a particular user a job can be active, archived, overdue, pending, or closed. Right now every page request is generating 5 COUNT queries and in an attempt at optimization I'm trying to reduce this to a single query. This is what I have so far but it is barely faster than the 5 individual queries. Note that I've simplified the conditions for each subquery to make it easier to understand, the full query acts the same however.

Is there a way to get these 5 counts in the same query without using the inefficient subqueries?

SELECT
  (SELECT count(*)
    FROM "jobs"
    WHERE
      jobs.creator_id = 5 AND
      jobs.status_id NOT IN (8,3,11) /* 8,3,11 being 'inactive' related statuses */
  ) AS active_count, 
  (SELECT count(*)
    FROM "jobs"
    WHERE
      jobs.creator_id = 5 AND
      jobs.due_date < '2011-06-14' AND
      jobs.status_id NOT IN(8,11,5,3) /* Grabs the overdue active jobs
                                      ('5' means completed successfully) */
  ) AS overdue_count,
  (SELECT count(*)
    FROM "jobs"
    WHERE
      jobs.creator_id = 5 AND
      jobs.due_date BETWEEN '2011-06-14' AND '2011-06-15 06:00:00.000000'
  ) AS due_today_count

This goes on for 2 more subqueries but I think you get the idea.

Is there an easier way to collect this data since it's basically 5 different COUNT's off of the same subset of data from the jobs table?

The subset of data is 'creator_id = 5', after that each count is basically just 1-2 additional conditions. Note that right now we're using Postgres but may be moving to MySQL in the near future. So if you can provide an ANSI-compatible solution I'd be gratetful :)


Solution

  • This is the typical solution. Use a case statement to break out the different conditions. If a record meets it gets a 1 else a 0. Then do a SUM on the values

      SELECT
        SUM(active_count) active_count,
        SUM(overdue_count) overdue_count
        SUM(due_today_count) due_today_count
      FROM 
      (
    
      SELECT 
        CASE WHEN jobs.status_id NOT IN (8,3,11) THEN 1 ELSE 0 END active_count,
        CASE WHEN jobs.due_date < '2011-06-14' AND jobs.status_id NOT IN(8,11,5,3)  THEN 1 ELSE 0 END  overdue_count,
        CASE WHEN jobs.due_date BETWEEN '2011-06-14' AND '2011-06-15 06:00:00.000000' THEN 1 ELSE 0 END  due_today_count
    
        FROM "jobs"
        WHERE
          jobs.creator_id = 5 ) t
    

    UPDATE As noted when 0 records are returned as t this result in as single result of Nulls in all the values. You have three options

    1) Add A Having clause so that you have No records returned rather than result of all NULLS

       HAVING SUM(active_count) is not null
    

    2) If you want all zeros returned than you could add coalesce to all your sums

    For example

     SELECT
          COALESCE(SUM(active_count)) active_count,
           COALESCE(SUM(overdue_count)) overdue_count
          COALESCE(SUM(due_today_count)) due_today_count
    

    3) Take advantage of the fact that COUNT(NULL) = 0 as sbarro's demonstrated. You should note that the not-null value could be anything it doesn't have to be a 1

    for example

     SELECT
          COUNT(CASE WHEN 
                jobs.status_id NOT IN (8,3,11) THEN 'Manticores Rock' ELSE NULL
           END) as [active_count]