Search code examples
sqlsql-serverquery-optimizationunion

My query works but I need to make is scalable. I need to remove unions but not sure how to proceed


I need a list of the number of times a file has processed through our system. For example in week 1, for a given warehouseID, x number of files have processed 1 time, x number have processed 2 times, and so on. I have written a query that uses a temp table and multiple unions but its not scalable. For instance if a file is processed 42 times the query in its current form would be absurdly long. I thinkI need some self joins maybe? Or sub queries? But I'm having a really hard time sorting my next step out.

Currently I'm stacking it all in union statements. The answer is right and accurate. But my goal is to make it scalable and not rely on hard coding. I've tried rewriting using having count(*)>1 but that just goes back to hardcoding and using unions. There's got to be a more dynamic way.

Drop table If exists #a; 
select FileMasterID, datepart(ww, complete_dttm) AS WeekNum, count(1) as TimesProcessed, whid
INTO #a
from Reporting.Log
where Complete_dttm > '2018-12-31'
Group By FileMasterID, datepart(ww, complete_dttm), whid;

/*
This is the reporting output of the query to answer how many files have been processed once, twice and so on.
*/

select whid, COUNT(1) AS FileCount, 'Processed 1 time' AS PT from #a where TimesProcessed=1 GROUP BY whid UNION ALL
select whid, COUNT(1) AS FileCount, 'Processed 2 times' AS PT from #a where TimesProcessed=2 GROUP BY whid UNION ALL
select whid, COUNT(1) AS FileCount, 'Processed 3 times' AS PT from #a where TimesProcessed=3 GROUP BY whid UNION ALL
select whid, COUNT(1) AS FileCount, 'Processed 4 times' AS PT from #a where TimesProcessed=4 GROUP BY whid UNION ALL
select whid, COUNT(1) AS FileCount, 'Processed 5 times' AS PT from #a where TimesProcessed=5 GROUP BY whid UNION ALL
select whid, COUNT(1) AS FileCount, 'Processed 6 times' AS PT from #a where TimesProcessed=6 GROUP BY whid 
Order by whid,  PT

Looking for a count of how many files processed 1 time, 2 times, 3 times, and so on.

whid    FileCount   PT
1       82108       Processed 1 time
1       3203        Processed 2 times
1       78          Processed 3 times
1       13          Processed 4 times
2       78702       Processed 1 time
2       1209        Processed 2 times

Solution

  • You seem to be looking for a simple aggregated query. The exact syntax might slightly change depending on your RDBMS (namely, whether it supports aliases in the GROUP BY clause or not), but it should look like:

    SELECT
        whid,
        COUNT(*) FileCount,
        CONCAT('Processed ', TimesProcessed, ' time(s)') PT
    FROM #a
    GROUP BY whid, PT
    -- OR : GROUP BY whid, CONCAT('Processed ', TimesProcessed, ' time(s)')
    

    If you want to limit the query to a specific list of TimesProcessed values, you can add a WHERE clause:

    SELECT
        whid,
        COUNT(*) FileCount,
        CONCAT('Processed ', TimesProcessed, ' time(s)') PT
    FROM #a
    WHERE TimesProcessed IN (1, 2, 4, 5, 6) -- or some other condition
    GROUP BY whid, PT