I am trying to do a full join between a subset of a table and each other subset of the same table, even if some rows are missing. I have a table of the form:
CREATE TABLE #Test (
SiteID int NOT NULL
, DayOfWeekID tinyint NOT NULL
, Quantity float NOT NULL );
This is filled with the quantity of sales at each site for each day of the week. I then use SiteID = -1 as an aggregate (mean), and I am looking for the error between each site and this mean, e.g.
CREATE TABLE #TestError (
SiteID int NOT NULL
, Error float NOT NULL );
INSERT INTO #TestError
( SiteID
, Error
)
SELECT t1.SiteID
, SUM(ABS(isnull(t1.Quantity, 0) - t2.Quantity))
FROM #Test t1
JOIN #Test t2
ON t1.DayOfWeekID = t2.DayOfWeekID
AND t2.SiteID = -1
GROUP BY t1.SiteID;
This works fine provided that, for every SiteID, all seven days of the week are present. However, if a site has no sales for a given day of the week, then that row will be missing from the summation and the reported error will be lower than it should be. On the other hand, if I use a RIGHT JOIN instead of the JOIN, I can't group by t1.SiteID, since that value would be NULL in the generated row. The obvious answer is that I need to ensure that every SiteID has all seven days, by instantiating #Test as a cross join between a table of SiteIDs and a table containing DayOfWeekIDs 1-7, with initial quantities zero, and then updating it with real values where they exist. But that seems inefficient, and more broadly this is something that has come up for me before and I'd like to understand the 'proper' SQL way of writing this kind of join. Thanks!
If site -1
has all days of the week, you can cross join
its records with the list of distinct sites, then bring the table with a left join
and aggregate:
select s.siteid, sum(abs(coalesce(t1.quantity, 0) - t0.quantity)) as diff
from #test t0
cross join (select distinct siteid from #test) s
left join #test t1 on t1.dayofweekid = t0.dayofweekid and t1.siteid = s.siteid
where t0.siteid = -1
group by s.siteid