Search code examples
sqlsql-serversubqueryleft-joinaggregate-functions

Summarising data including zeros for where no data exists


I have a table containing a list of years, from 2010 onwards, along the lines of.

CREATE TABLE Years
(
    YearName int
);
    
INSERT INTO Years (YearName)
VALUES
    (2010), (2011), (2012), (2013),
    (2014), (2015), (2016), (2017),
    (2018), (2019), (2020), (2021),
    (2022), (2023), (2024), (2025)

I have a second table containing people, along the lines of

CREATE TABLE People
(
    PersonID int PRIMARY KEY, 
    PersonName varchar(50)
);
    
INSERT INTO People (PersonID, PersonName)
VALUES
    (1, 'Bob'),
    (2, 'Kate'),
    (3, 'Jo'),
    (4, 'Fred');

And I have a table containing the work that people have done, of various types each year:

CREATE TABLE Workload
(
    ID int PRIMARY KEY, 
    PersonID int, 
    YearName int, 
    WorkType varchar(8), 
    Hours int
);
    
INSERT INTO Workload (ID, PersonID, YearName, WorkType, Hours)
VALUES
    (1, 1, 2014, 'Plumbing', 7),
    (2, 1, 2020, 'Washing', 9),
    (3, 1, 2020, 'Cooking', 10),
    (4, 1, 2020, 'Drawing', 4),
    (5, 1, 2021, 'Reading', 2),
    (6, 2, 2020, 'Washing', 9),
    (7, 2, 2021, 'Cooking', 10),
    (8, 2, 2022, 'Drawing', 4),
    (9, 3, 2014, 'Cooking', 4),
    (10, 3, 2014, 'Plumbing', 22),
    (11, 3, 2015, 'Washing', 7);

I want to summarise the total amount of work done by each person, per year. So, I've got this as my starting point:

SELECT 
    PersonName, YearName, SUM(Hours) AS WorkDone
FROM 
    People p 
INNER JOIN 
    Workload w ON p.PersonID = w.PersonID
WHERE 
    YearName BETWEEN YEAR(GETDATE()) - 9 AND YEAR(GETDATE())
GROUP BY 
    PersonName, YearName

This works fine, but what I'd like to have is an output that gives a zero value for each person in each year where they did none of these tasks:

Person Year Workload
Bob 2014 7
Bob 2015 0
Bob 2016 0
Bob 2017 0
Bob 2018 0
Bob 2019 0
Bob 2020 23
Bob 2021 0
Bob 2022 0
Bob 2023 0
Kate 2014 0
Kate 2015 0
Kate 2016 0
Kate 2017 0
Kate 2018 0
Kate 2019 0
Kate 2020 9
Kate 2021 10
Kate 2022 4
Kate 2023 0

... and so on for the other people and years.

How do I best achieve this? I sense that I probably need to cross apply the three tables, but can't seem to figure out how to do so and get the results I need.


Solution

  • You could cross join the years and people table to generate all possible combinations, then bring the workload table with a LEFT JOIN - which ensures that "missing" years/people tuples are not filtered out. The final step is aggregation, and COALESCE() to return 0 on unmatched tuples.

    SELECT p.PersonName, y.YearName, COALESCE(SUM(w.Hours), 0) as WorkDone
    FROM People p 
    CROSS JOIN Years y
    LEFT JOIN Workload w ON w.PersonID = p.PersonID AND w.YearName = y.YearName
    WHERE y.YearName BETWEEN YEAR(GetDate())-9 AND YEAR(GETDATE())
    GROUP BY p.PersonID, p.PersonName, y.YearName
    ORDER BY p.PersonName, y.YearName
    

    Note that it is safer to include the person id in the group by clause; two different persons might have the same name, and you probably don't want their workloads to be grouped together.

    We could also use a correlated subquery (or apply) to do the workload computation, which would avoid outer aggregation:

    SELECT p.PersonName, y.YearName, w.*
    FROM People p 
    CROSS JOIN Years y
    CROSS APPLY (
        SELECT COALESCE(SUM(w.Hours), 0) as WorkDone
        FROM Workload w 
        WHERE w.PersonID = p.PersonID AND w.YearName = y.YearName
    ) w
    WHERE y.YearName BETWEEN YEAR(GetDate())-9 AND YEAR(GETDATE())
    ORDER BY p.PersonName, y.YearName