Search code examples
sqlsql-servergroup-bysubqueryrollup

Aggregation: Category missing in SQL subquery


my base data:

Process ID Location Date Timeliness
2030608 New York May 24 in time
2067393 Ohio May 24 overdue
1329306 Ohio May 24 in time
1740814 Ohio June 24 overdue
1924676 Chicago May 24 overdue
1985313 Chicago May 24 overdue
1715694 San Fransisco June 24 in time
1981887 San Fransisco June 24 overdue
1752416 San Fransisco July 24 overdue
1671006 New York May 24 in time
2030607 New York June 24 in time

and I want to aggregate it to use it in a visualisation tool.

Current output of my aggregation query:

Location Date intime total percentage
New York May 24 3 3 100
Ohio May 24 1 2 50
San Fransisco June 24 1 2 50

My problem: certain data drops out of the aggregation because there are no intime processes at that location and month.

The output that I need: (follows directly from the base data above)

Location Date intime total percentage
New York May 24 3 3 100
Ohio May 24 1 2 50
Ohio June 24 0 1 0
San Fransisco June 24 1 2 50
San Fransisco July 24 0 1 0
Chicago May 24 0 2 0

The query that I am currently using for aggregation is:

SELECT
    gesamt2.Location,
    gesamt2.date,
    intime2.anzahl AS intime,
    gesamt2.anzahl AS total,

    CAST(intime2.anzahl AS float) / CAST(gesamt2.anzahl AS float) * 100 AS percentage

FROM
    (
        SELECT
            location,
            anzahl,
            date
        FROM
            (
                SELECT
                    Location,
                    Date,
                    timeliness,
                    COUNT(1) AS anzahl
                FROM
                    basedata
                GROUP BY ROLLUP (Location, Date, Timeliness)
            ) AS gesamt
        WHERE
            (timeliness IS NULL)
    ) AS gesamt2
    INNER JOIN
    (
        SELECT
            mapping, anzahl, date
        FROM
            (
                SELECT
                    Location,
                    Date,
                    timeliness,
                    COUNT(1) AS anzahl
                FROM
                    basedata
                WHERE
                    (BASE_FINAL_APPROVAL > GETDATE() - 366)
                GROUP BY ROLLUP (Location, Date, timeliness)
            ) AS intime
        WHERE
            (timeliness = 'in time')
    ) AS intime2 ON
        gesamt2.Location = intime2.Location
        AND
        gesamt2.date = intime2.date

Solution

  • The query is much simpler if you use conditional aggregation.

    SELECT *, ROUND(intime/total*100, 0) as percentage
    FROM
        (SELECT Location, 
               [Date], 
               SUM(CASE WHEN Timeliness='in time' THEN 1 ELSE 0 END) as intime,
               COUNT(*) as total
        FROM YourTable
        GROUP BY Location, [Date]) agg 
    ;      
    
    

    or with CTE

    WITH agg AS 
    (SELECT Location, 
           [Date], 
           SUM(CASE WHEN Timeliness='in time' THEN 1 ELSE 0 END) as intime,
           COUNT(*) as total
    FROM YourTable
    GROUP BY Location, [Date])  
    
    SELECT *, ROUND(intime/total*100, 0) percentage
    FROM agg
    

    The above was not tested in SQL Server.

    MySQL Demo