Search code examples
sqlpostgresqlselectunion-all

Combining 3 SELECT statements to output 1 table


I have three queries with results.

Query 1:

SELECT DISTINCT employeeid, work.clientid, ROUND ((CAST (AVG(current_lawn_price) AS numeric) / CAST (AVG((((EXTRACT(HOUR FROM job_finish)*60) + EXTRACT(MIN FROM job_finish))) - ((EXTRACT(HOUR FROM job_start)*60) + EXTRACT(MIN FROM job_start))) AS numeric)) / 1, 2) AS under_over_1
FROM work
JOIN timesheet USING (date_linkid)
JOIN client USING (clientid)
WHERE employeeid = 1 AND workid < 557 AND workid > 188
GROUP BY employeeid, clientid ORDER BY clientid ASC;

employeeid | clientid | under_over_1
------------+----------+--------------
          1 |        3 |         0.54
          1 |        4 |         0.47
          1 |        6 |         0.45
          1 |        7 |         0.59
          . |        . |           .
          . |        . |           .

Query 2:

SELECT DISTINCT employeeid, work.clientid, ROUND ((CAST (AVG(current_lawn_price) AS numeric) / CAST (AVG((((EXTRACT(HOUR FROM job_finish)*60) + EXTRACT(MIN FROM job_finish))) - ((EXTRACT(HOUR FROM job_start)*60) + EXTRACT(MIN FROM job_start))) AS numeric)) / 1.31666666666667, 2) AS under_over_1
FROM work
JOIN timesheet USING (date_linkid)
JOIN client USING (clientid)
WHERE employeeid = 2
GROUP BY employeeid, clientid ORDER BY clientid ASC;

 employeeid | clientid | under_over_1
------------+----------+--------------
          2 |        2 |         1.01
          2 |        3 |         0.21
          2 |        4 |         0.71
          2 |        6 |         0.68
          . |        . |           .
          . |        . |           .

Query: 3

SELECT DISTINCT employeeid, work.clientid, ROUND ((CAST (AVG(current_lawn_price) AS numeric) / CAST (AVG((((EXTRACT(HOUR FROM job_finish)*60) + EXTRACT(MIN FROM job_finish))) - ((EXTRACT(HOUR FROM job_start)*60) + EXTRACT(MIN FROM job_start))) AS numeric)) / 1.31666666666667, 2) AS under_over_1
FROM work
JOIN timesheet USING (date_linkid)
JOIN client USING (clientid)
WHERE employeeid = 3
GROUP BY employeeid, clientid ORDER BY clientid ASC;

 employeeid | clientid | under_over_1
------------+----------+--------------
          3 |        4 |         0.70
          3 |        6 |         0.54
          3 |        7 |         1.03
          3 |       11 |         0.74
          . |        . |           .
          . |        . |           .

I want to output one table with the result of all three queries such as (I am sorry but I have to write more here so I can submit this post. I hope this is enough;-)):

employeeid | clientid | under_over_1
------------+----------+--------------
          1 |        3 |         0.54
          1 |        4 |         0.47
          1 |        6 |         0.45
          1 |        7 |         0.59
          . |        . |           .
          . |        . |           .
          2 |        2 |         1.01
          2 |        3 |         0.21
          2 |        4 |         0.71
          2 |        6 |         0.68
          . |        . |           .
          . |        . |           .
          3 |        4 |         0.70
          3 |        6 |         0.54
          3 |        7 |         1.03
          3 |       11 |         0.74
          . |        . |           .
          . |        . |           .

I tried UNION ALL as in

SELECT DISTINCT employeeid, work.clientid, ROUND ((CAST (AVG(current_lawn_price) AS numeric) / CAST (AVG((((EXTRACT(HOUR FROM job_finish)*60) + EXTRACT(MIN FROM job_finish))) - ((EXTRACT(HOUR FROM job_start)*60) + EXTRACT(MIN FROM job_start))) AS numeric)) / 1, 2) AS under_over_1
FROM work
JOIN timesheet USING (date_linkid)
JOIN client USING (clientid)
WHERE employeeid = 1 AND workid < 557 AND workid > 188
GROUP BY employeeid, clientid ORDER BY clientid ASC

UNION ALL

SELECT DISTINCT employeeid, work.clientid, ROUND ((CAST (AVG(current_lawn_price) AS numeric) / CAST (AVG((((EXTRACT(HOUR FROM job_finish)*60) + EXTRACT(MIN FROM job_finish))) - ((EXTRACT(HOUR FROM job_start)*60) + EXTRACT(MIN FROM job_start))) AS numeric)) / 1.31666666666667, 2) AS under_over_1
FROM work
JOIN timesheet USING (date_linkid)
JOIN client USING (clientid)
WHERE employeeid = 3
GROUP BY employeeid, clientid ORDER BY clientid ASC

UNION ALL

SELECT DISTINCT employeeid, work.clientid, ROUND ((CAST (AVG(current_lawn_price) AS numeric) / CAST (AVG((((EXTRACT(HOUR FROM job_finish)*60) + EXTRACT(MIN FROM job_finish))) - ((EXTRACT(HOUR FROM job_start)*60) + EXTRACT(MIN FROM job_start))) AS numeric)) / 1.31666666666667, 2) AS under_over_1
FROM work
JOIN timesheet USING (date_linkid)
JOIN client USING (clientid)
WHERE employeeid = 2
GROUP BY employeeid, clientid ORDER BY clientid ASC;

However, I get the following error:

ERROR:  syntax error at or near "UNION"
LINE 7: UNION ALL

I am not sure why this is wrong or whether UNION ALL is the right approach here. Anyone have a clue?


Solution

  • The immediate cause for the error is, quoting the manual:

    (ORDER BY and LIMIT can be attached to a subexpression if it is enclosed in parentheses. Without parentheses, these clauses will be taken to apply to the result of the UNION, not to its right-hand input expression.)

    Bold emphasis mine.

    So, as suggested by @wilx, enclosing each SELECT in parentheses would fix that.

    But there is more.

    Merge into single query

    SELECT employeeid, work.clientid    -- no DISTINCT
         , ROUND ((AVG(current_lawn_price)
                 / AVG(extract(epoch FROM job_finish - job_start)))::numeric / 79, 2) AS under_over_1
    FROM   work
    JOIN   timesheet USING (date_linkid)
    JOIN   client    USING (clientid)
    WHERE (employeeid IN (1, 2) OR
           employeeid = 3
       AND workid < 557
       AND workid > 188)
    GROUP  BY employeeid, clientid 
    ORDER  BY CASE employeeid
                WHEN 1 THEN 1
                WHEN 2 THEN 3
                WHEN 3 THEN 2
              END
            , clientid;
    

    Merge the 3 SELECT queries.

    Remove the redundant DISTINCT. That's an expensive no-op after GROUP BY.

    Instead of extracting hours and minutes from both begin and end timestamps etc., compute the interval by plain subtraction (works with timestamp or time values alike) and extract the epoch from it. Gives you the number of seconds. Divide by 60 and you got the number of minutes much faster. 79 being the result of 60 * 1.31666666666667, accordingly.

    The manual about extracting epoch:

    For timestamp with time zone values, the number of seconds since 1970-01-01 00:00:00 UTC (can be negative); for date and timestamp values, the number of seconds since 1970-01-01 00:00:00 local time; for interval values, the total number of seconds in the interval

    Bold emphasis mine.

    Since this removes UNION ALL, the parentheses mentioned at the top are not needed any more.

    The CASE expression makes up for the mixed order in employeeid, like sticky bit provided.

    If queries cannot be merged

    If, for some reason, you can't or won't merge the three original SELECT queries, do this instead:

    (  -- parentheses required
    SELECT employeeid, work.clientid    -- no DISTINCT !
         , ROUND ((AVG(current_lawn_price)
                 / AVG(extract(epoch FROM job_finish - job_start)))::numeric / 79, 2) AS under_over_1
    FROM   work
    JOIN   timesheet USING (date_linkid)
    JOIN   client    USING (clientid)
    WHERE  employeeid = 1
    AND    workid < 557 
    AND    workid > 188
    GROUP  BY clientid  -- no need to GROUP BY employeeid while filtering single value
    ORDER  BY clientid
    )
    
    UNION ALL
    (
    SELECT employeeid, work.clientid
         , ROUND ((AVG(current_lawn_price)
                 / AVG(extract(epoch FROM job_finish - job_start)))::numeric / 79, 2) AS under_over_1
    FROM   work
    JOIN   timesheet USING (date_linkid)
    JOIN   client    USING (clientid)
    WHERE  employeeid = 3
    GROUP  BY clientid
    ORDER  BY clientid
    )
    
    UNION ALL
    (
    SELECT employeeid, work.clientid
         , ROUND ((AVG(current_lawn_price)
                 / AVG(extract(epoch FROM job_finish - job_start)))::numeric / 79, 2) AS under_over_1
    FROM   work
    JOIN   timesheet USING (date_linkid)
    JOIN   client    USING (clientid)
    WHERE  employeeid = 2
    GROUP  BY clientid
    ORDER  BY clientid
    );
    -- no outer ORDER BY required
    

    Keep ORDER BY per SELECT and add parentheses to fix the syntax. UNION ALL (as opposed to UNION) simply appends results preserving the order of individual SELECTs. This should be cheaper than ordering the whole set after UNION ALL. And you did want to keep using queries individual SELECT queries "as is" ...

    Most of the other advice above applies accordingly.

    Aside: make it a habit to use table aliases and table-qualify all columns in queries joining multiple tables. Much more robust against later changes and easier to read / debug.