Search code examples
sqlpostgresqlselectcommon-table-expressionpostgresql-performance

Better way than multiple SELECT statements?


I'm creating a web app that displays a pie chart. In order to get all the data for the chart from a PostgreSQL 9.3 database in a single HTTP request, I'm combining multiple SELECT statements with UNION ALL — here's a portion:

SELECT 'spf' as type, COUNT(*)
    FROM (SELECT cai.id
          FROM common_activityinstance cai
          JOIN common_activityinstance_settings cais ON cai.id = cais.activityinstance_id
          JOIN common_activitysetting cas ON cas.id = cais.id
          JOIN quizzes_quiz q ON q.id = cai.activity_id
          WHERE cai.end_time::date = '2015-09-12'
          AND q.name != 'Exit Ticket Quiz'
          AND cai.activity_type = 'QZ'
          AND (cas.key = 'disable_student_nav' AND cas.value = 'True'
            OR cas.key = 'pacing' AND cas.value = 'student')
          GROUP BY cai.id
          HAVING COUNT(cai.id) = 2) sub
UNION ALL
SELECT 'spn' as type, COUNT(*)
    FROM common_activityinstance cai
    JOIN common_activityinstance_settings cais ON cai.id = cais.activityinstance_id
    JOIN common_activitysetting cas ON cas.id = cais.id
    WHERE cai.end_time::date = '2015-09-12'
    AND cai.activity_type = 'QZ'
    AND cas.key = 'disable_student_nav'
    AND cas.value = 'False'
UNION ALL
SELECT 'tp' as type, COUNT(*)
    FROM (SELECT cai.id 
          FROM common_activityinstance cai
          JOIN common_activityinstance_settings cais ON cai.id = cais.activityinstance_id
          JOIN common_activitysetting cas ON cas.id = cais.id
          WHERE cai.end_time::date = '2015-09-12'
          AND cai.activity_type = 'QZ'
          AND cas.key = 'pacing' AND cas.value = 'teacher') sub;

This produces a nice, small response for sending back to the client:

 type |  count 
------+---------
 spf  |  100153
 spn  |   96402
 tp   |   84211

I wonder if my queries can be made more efficient. Each SELECT statement uses mostly the same JOIN operations. Is there a way to not repeat the JOIN for each new SELECT?
And I would actually prefer a single row with 3 columns.

Or, in general, is there some entirely different but better approach than what I'm doing?


Solution

  • You can bundle most of the cost in a single main query in a CTE and reuse the result several times.
    This returns a single row with three columns named after each type (as requested in the comment):

    WITH cte AS (
       SELECT cai.id, cai.activity_id, cas.key, cas.value
       FROM   common_activityinstance cai
       JOIN   common_activityinstance_settings s ON s.activityinstance_id = cai.id
       JOIN   common_activitysetting cas ON cas.id = s.id
       WHERE  cai.end_time::date = '2015-09-12'   -- problem?
       AND    cai.activity_type = 'QZ'
       AND   (cas.key = 'disable_student_nav' AND cas.value IN ('True', 'False') OR
              cas.key = 'pacing' AND cas.value IN ('student', 'teacher'))
       )
    SELECT *
    FROM  (
       SELECT count(*) AS spf
       FROM  (
          SELECT c.id
          FROM   cte c
          JOIN   quizzes_quiz q ON q.id = c.activity_id
          WHERE  q.name <> 'Exit Ticket Quiz'
          AND   (c.key, c.value) IN (('disable_student_nav', 'True')
                                   , ('pacing', 'student'))
          GROUP  BY 1
          HAVING count(*) = 2
          ) sub
       ) spf
    ,  (
       SELECT count(key = 'disable_student_nav' AND value = 'False' OR NULL) AS spn
            , count(key = 'pacing' AND value = 'teacher' OR NULL) AS tp
       FROM   cte
       ) spn_tp;
    

    Should work for Postgres 9.3. In Postgres 9.4 use the new aggregate FILTER clause:

      count(*) FILTER (WHERE key = 'disable_student_nav' AND value = 'False') AS spn
    , count(*) FILTER (WHERE key = 'pacing' AND value = 'teacher') AS tp
    

    Details for both syntax variants:

    The condition marked problem? may be big performance problem, depending on the data type of cai.end_time. For one, it's not sargable. And if it's a timestamptz type, the expression is hard to index, because the result depends on the current time zone setting of the session - which can also lead to different results when executed in different time zones.

    Compare:

    You just have to name the time zone that is supposed to define your date. Taking my time zone in Vienna as example:

    WHERE  cai.end_time >= '2015-09-12 0:0'::timestamp AT TIME ZONE 'Europe/Vienna' 
    AND    cai.end_time <  '2015-09-13 0:0'::timestamp AT TIME ZONE 'Europe/Vienna'
    

    You can provide simple timestamptz values as well. You could even just:

    WHERE  cai.end_time >= '2015-09-12'::date
    AND    cai.end_time <  '2015-09-12'::date + 1
    

    But the first variant does not depend on the current time zone setting.
    Detailed explanation in the links above.

    Now the query can use your index and should be much faster if there are many different days in your table.