Search code examples
joinself-joinfull-outer-join

How to join tables of events to report non-events?


Employees must complete continuing education modules to stay competent. I need to build a report showing supervisors which modules their employees have – and have not -- completed.

I have a simple table of employees and their supervisors. Each row is an employee, with two columns: Supervisor Name and Employee Name. Call this the Staff table.

Supervisor Employee Name
Mike Jack

I have a table of module completions. Each row is an event -- a module completion, with columns for the Module Code, Employee Name, and Date of Completion. Call this the Completions table.

Employee Name Module Code Date of Completion
Jack MOD1 2022-01-20
Jack MOD2 2022-01-21

I have a table of all the Modules. Each row is a module, with columns for the Module Code and other module-level information, such as Module Name, etc.

Module Code Module Name Format
MOD1 Important Content Online
MOD2 Super Important Content Online
MOD3 Oh so Critical Content In person
MOD4 You Must Learn This Content In person

I can join the Staff table to the Completion table on Employee Name. That gives me a list of employees and the modules they have completed.

Employee Name Module Code Completion Date Completions
Jack MOD1 2022-01-20 1
Jack MOD2 2022-01-21 1

I’m halfway there.

The challenge is how to report the uncompleted modules as well, so that we see this:

Employee Name Module Code Completion Date Completions
Jack MOD1 2022-01-20 1
Jack MOD2 2022-01-21 1
Jack MOD3 0
Jack MOD4 0

By what configuration of joins among the Staff, Completions, and Modules tables can I report completions of all modules for every employee, even there is no completion date for some modules?

I'm stumped. I'm imagining different kinds of joins -- some kind of full outer join of Completions and Modules, or some kind of self join of Completions that reports a sum of all completions for all modules for every employee -- but I've reached the current limit of my skill.


Solution

  • Since we don't know what modules each employee has been assigned. We then assume each module must be associated to each employee and we build that assocation.

    To do this we use a CROSS JOIN which is every record of one table related to every record of another table. Caution as this results in a Cartesian product. (1000*1000) = 1,000,000 records! but each employee is then assigned a module.

    SELECT ...
    FROM SupervisorEmployee E 
    CROSS JOIN Modules  M -- This cross join ensure every employee is assigned every 
                          -- record.  note there is no ON clause when using a cross join.
    LEFT JOIN Completions C 
           on {keys} 
    WHERE...
    

    NOTE: Since we are using a left join. the where clause should only have limits on the SupervisorEmploye and Modules table. It should not have a limit on the completions table or it will negate the left join. (you can do it but you have to use an or and handle nulls it's just simpler to not put limits here)

    If you need to limit on completions; do so using the On criteria so the reduction occurs before/as the join occurs and you retain the records without a completion.