Search code examples
sqlsql-server

Get correct counts from grouped by values from a XREF Table that depend on values from other tables?


How to get correct counts from grouped by values from a XREF Table that depend on values from other tables?

I have tables of Person Event XPersonEvent

Person
----------------------------------
PersonID  FirstName    DateCanceled
----------------------------------
10        Joe          null
20        Bob          01/01/2024
30        Lou          null

Event
----------------------------------
EventID   EventName    MaxAttendees
----------------------------------
100       Austin       55
200       Boston       65
300       Cleveland    75

XPersonEvent
----------------------------------
ID        PersonID     EventID
----------------------------------
1000       10          100
1001       20          100
1002       30          100
1003       30          300

This SQL gives me the following report.

Select
e.EventName, e.MaxAttendees,
count(xpe.EventID) as #Registered,
count(xpe.EventID) as #Attending From [Event] as e

Join XPersonEvent as xpe on xpe.EventID = e.EventID
Join Person as p on p.PersonID = xpe.PersonID

where p.DateCanceled is null 
Group By e.EventName, e.MaxAttendees
Order by e.EventName

EventName     MaxAttendees #Registered #Attending
------------- ------------ ----------- -----------
Austin        55           2           2
Cleveland     75           1           1

What needs to be changed is

  1. I need to see all events even if they do not have anyone registered (and #Registered = 0 #Attending = 0 for them)
  2. I need correct counts of both #Registered (DateCanceld null and not null) #Attending (DateCanceld is null )

How do I get this report?

EventName     MaxAttendees #Registered #Attending
------------- ------------ ----------- -----------
Austin        55           3           2
Boston        65           0           0
Cleveland     75           1           1

Solution

    1. Use LEFT OUTER JOIN to include rows that do not have a match
    2. COUNT() in SQL Server excludes null values, which is helpful for OUTER JOIN but it counts all rows, if you want different counts from the same record set for different criteria you can use COUNT(CASE ) to only include values/rows that you want to be counted.

    I need to see all events even if they do not have anyone registered

    LEFT OUTER JOIN XPersonEvent as xpe on xpe.EventID = e.EventID
    LEFT OUTER JOIN Person as p on p.PersonID = xpe.PersonID
    

    This will mean that all the rows in the FROM table will be returned, even if there is no matching row in XPersonEvent or Person. For all the events with no match, the values from the joined tables will be NULL.

    This doesn't have a direct impact on COUNT() because the null values are excluded.

    I need correct counts of both #Registered (DateCanceld null and not null) #Attending (DateCanceld is null)

    This is less intuitive but when you need to count multiple different criteria in the same result set, you can't define that criteria in the WHERE clause, especially if you want all of the LEFT rows (in the FROM table) to be returned.

    #Registered

    This count needs to include all XPersonEvent records, even if they have been cancelled. So removing the WHERE criteria will fix this.

    #Attending

    This count needs to include only the rows that have not been cancelled, so rows where the expression p.DateCanceled is null is true. So we use that in a CASE statement:

    COUNT(CASE 
        WHEN p.DateCanceled is null 
        AND xpe.EventID IS NOT NULL
        THEN 1 END) AS #Attending
    

    It is important to also add criteria to match only when the XPersonEvent record exists, remember that the LEFT OUTER JOIN will leave all other columns NULL so we do not want these included in the count if there are no registrations at all.

    Here I have used an arbitrary value 1 to count the row, the column in this case is not relevant and 1 is a conventional way to represent this. There is no ELSE clause in the above CASE statement which means that any rows that do not match will have a value of NULL, which means they will not be counted, exactly what we want here.

    We can put this all together:

    Select
    e.EventName, e.MaxAttendees,
    count(xpe.EventID) as #Registered,
    COUNT(CASE 
        WHEN p.DateCanceled is null 
             AND xpe.EventID IS NOT NULL
        THEN 1 END) AS #Attending
    From [Event] as e
    
    LEFT OUTER Join XPersonEvent as xpe on xpe.EventID = e.EventID
    LEFT OUTER Join Person as p on p.PersonID = xpe.PersonID
    
    Group By e.EventName, e.MaxAttendees
    Order by e.EventName
    

    NOTE: In some other RDBMS this trick with COUNT(CASE ) doesn't work if they do count nulls, instead you have to use SUM(CASE ... THEN 1 ELSE 0)

    EventName MaxAttendees #Registered #Attending
    Austin 55 3 2
    Boston 65 0 0
    Cleveland 75 1 1