Search code examples
sqlamazon-redshiftunpivotunion-allamazon-quicksight

Same result without using UNION ALL on repeating SELECT statements?


I built a table in Amazon QuickSight, but the way the viewer wants the data organized in a pivot table, I'm only able to achieve with the below SQL snippet. I don't have a problem leaving it as is, but I'd like to see if there are better ways of doing what I did, especially without using the repetitive parts in each individual table that is being combined together. There are actually a lot more "problem" columns I'm querying, so the actual SQL is very long and very repetitive.

The only part that is changed is problem and problem_count between each table.

'Service' AS problem,
 COUNT(CASE WHEN service_yn = 'Yes' THEN 1 END) AS problem_count

I'm sorry, I don't know how to properly format a pivot table but I'll do my best here. Imagine the following 3 tables are one large pivot table.

Base Count
Accuracy of Bill 110
Service 200
Cleanliness 95
Value 75
Premium Count
Accuracy of Bill 110
Service 200
Cleanliness 95
Value 75
Deluxe Count
Accuracy of Bill 110
Service 200
Cleanliness 95
Value 75
SELECT CASE
         WHEN descriptor = 'Fast Food' THEN 'Base'
         WHEN descriptor IN ('Sit Down','Bar','Eatery') THEN 'Premium'
         WHEN descriptor IN ('Full Service','Boutique') THEN 'Deluxe'
         ELSE NULL
       END AS "brand_group",
       'Accuracy of bill' AS problem,
       COUNT(CASE WHEN accuracy_of_bill_yn = 'Yes' THEN 1 END) AS problem_count
FROM surveys
WHERE responsedate >= CURRENT_DATE-INTERVAL '12 months'
AND   surveyid NOT IN (SELECT surveyid FROM excluded)
AND   region IN (1,2,3,4,5)
AND   brand_group IS NOT NULL
GROUP BY brand_group

UNION ALL

SELECT CASE
         WHEN descriptor = 'Fast Food' THEN 'Base'
         WHEN descriptor IN ('Sit Down','Bar','Eatery') THEN 'Premium'
         WHEN descriptor IN ('Full Service','Boutique') THEN 'Deluxe'
         ELSE NULL
       END AS "brand_group",
       'Cleanliness' AS problem,
       COUNT(CASE WHEN cleanliness_yn = 'Yes' THEN 1 END) AS problem_count
FROM surveys
WHERE responsedate >= CURRENT_DATE-INTERVAL '12 months'
AND   surveyid NOT IN (SELECT surveyid FROM excluded)
AND   region IN (1,2,3,4,5)
AND   brand_group IS NOT NULL
GROUP BY brand_group

UNION ALL

SELECT CASE
         WHEN descriptor = 'Fast Food' THEN 'Base'
         WHEN descriptor IN ('Sit Down','Bar','Eatery') THEN 'Premium'
         WHEN descriptor IN ('Full Service','Boutique') THEN 'Deluxe'
         ELSE NULL
       END AS "brand_group",
       'Service' AS problem,
       COUNT(CASE WHEN service_yn = 'Yes' THEN 1 END) AS problem_count
FROM surveys
WHERE responsedate >= CURRENT_DATE-INTERVAL '12 months'
AND   surveyid NOT IN (SELECT surveyid FROM excluded)
AND   region IN (1,2,3,4,5)
AND   brand_group IS NOT NULL
GROUP BY brand_group

UNION ALL

SELECT CASE
         WHEN descriptor = 'Fast Food' THEN 'Base'
         WHEN descriptor IN ('Sit Down','Bar','Eatery') THEN 'Premium'
         WHEN descriptor IN ('Full Service','Boutique') THEN 'Deluxe'
         ELSE NULL
       END AS "brand_group",
       'Value' AS problem,
       COUNT(CASE WHEN value_yn = 'Yes' THEN 1 END) AS problem_count
FROM surveys
WHERE responsedate >= CURRENT_DATE-INTERVAL '12 months'
AND   surveyid NOT IN (SELECT surveyid FROM excluded)
AND   region IN (1,2,3,4,5)
AND   brand_group IS NOT NULL
GROUP BY brand_group

Solution

  • Note: this answer is for Postgres, as declared originally. Does not work for Redshift.

    You need the reverse operation of a "pivot", a.k.a. "unpivot". Can be done elegantly with CROSS JOIN LATERAL to a VALUES expression after getting all counts in a single SELECT.

    A single scan over the table should be substantially faster.
    While being at it, I optimized a couple other things.

    SELECT brand_group, p.*
    FROM  (
       SELECT CASE descriptor
                WHEN 'Fast Food'    THEN 'Base'
                WHEN 'Sit Down'     THEN 'Premium'
                WHEN 'Bar'          THEN 'Premium'
                WHEN 'Eatery'       THEN 'Premium'
                WHEN 'Full Service' THEN 'Deluxe'
                WHEN 'Boutique'     THEN 'Deluxe'
              END AS brand_group
            , count(*) FILTER (WHERE accuracy_of_bill_yn = 'Yes') AS a_ct
            , count(*) FILTER (WHERE cleanliness_yn = 'Yes')      AS c_ct
            , count(*) FILTER (WHERE service_yn = 'Yes')          AS s_ct
            , count(*) FILTER (WHERE value_yn = 'Yes')            AS v_ct
       FROM   surveys s
       WHERE  NOT EXISTS (SELECT FROM excluded e WHERE e.surveyid = s.surveyid)
       AND    responsedate >= now() - interval '12 months'
       AND    region IN (1,2,3,4,5)
       AND    brand_group IS NOT NULL
       GROUP  BY 1
       ) sub
    CROSS  JOIN LATERAL (
       VALUES
         ('Accuracy of bill', a_ct)
       , ('Cleanliness'     , c_ct)
       , ('Service'         , s_ct)
       , ('Value'           , v_ct)
       ) p(base, count)
    ORDER  BY 1, 2;  -- need that?
    

    fiddle

    See:

    Avoid NOT IN with subqueries. I replaced it with:

    WHERE NOT EXISTS (SELECT FROM excluded e WHERE e.surveyid = s.surveyid)
    

    See:

    All of your *_yn columns should really be boolean. Not text or anything else.

    About the aggregate FILTER clause:

    I use a "switched" CASE. Should be slightly cheaper. See:

    Just the base?

    Typically, people just want the more compact result from the base query (maybe with more descriptive column names):

    SELECT CASE descriptor
             WHEN 'Fast Food'    THEN 'Base'
             WHEN 'Sit Down'     THEN 'Premium'
             WHEN 'Bar'          THEN 'Premium'
             WHEN 'Eatery'       THEN 'Premium'
             WHEN 'Full Service' THEN 'Deluxe'
             WHEN 'Boutique'     THEN 'Deluxe'
           END AS brand_group
         , count(*) FILTER (WHERE accuracy_of_bill_yn = 'Yes') AS a_ct
         , count(*) FILTER (WHERE cleanliness_yn = 'Yes')      AS c_ct
         , count(*) FILTER (WHERE service_yn = 'Yes')          AS s_ct
         , count(*) FILTER (WHERE value_yn = 'Yes')            AS v_ct
    FROM   surveys s
    -- WHERE ... 
    GROUP  BY 1;
    

    fiddle