I have two tables Employee Master and Sales Fact with below structurs
Employee master:
empId | empName |
---|---|
1 | Jack |
2 | John |
3 | Luke |
Sales Fact:
empId | SalesDate | Amount |
---|---|---|
1 | 2023-10-12 | 100 |
2 | 2023-10-23 | 200 |
1 | 2023-11-12 | 100 |
3 | 2023-11-23 | 200 |
2 | 2023-12-12 | 100 |
3 | 2023-12-23 | 200 |
When I join these two tables and filter for a particular month, the employees who have not made any sales on that month are not shown. (Eg: When I filter for October 2023 only 1 and 2 are shown, when I filter for November 2023, only 1 and 3 are shown)
But my requirement is that I need to create a view to show all the employees for each month regardless of whether they have made a sale or not. (Eg: When I filter for October 2023 then I want all 3 employees and the amount should be NULLfor empId 3)
My current solution is to join these two tables every month and insert the data into a new table having one extra MonthYear column which is what I use for filter. But I have to run this monthly which is what I want to avoid
Note: Recursive CTEs are not avaliable in my databricks workspace
Here are the codes to create and insert values for these tables:
-- Create the table
CREATE TABLE employeeMaster (
empId INT,
empName STRING
);
-- Insert the values into the table
INSERT INTO employees VALUES
(1, 'Jack'),
(2, 'John'),
(3, 'Luke');
-- Create the table
CREATE TABLE sales (
empId INT,
SalesDate DATE,
Amount INT
);
-- Insert the values into the table
INSERT INTO salesFact VALUES
(1, '2023-10-12', 100),
(2, '2023-10-23', 200),
(1, '2023-11-12', 100),
(3, '2023-11-23', 200),
(2, '2023-12-12', 100),
(3, '2023-12-23', 200);
Use a subquery to return all the months (for which there is at least 1 sale), then cross join to employee to get all combos of employee and months and finally left join to sales:
select
e.empId,
e.empName,
mths.mth,
sum(s.amount) total_amount
from (select distinct date(date_trunc('MONTH', SalesDate)) mth from salesFact) mths
cross join employeeMaster e
left join salesFact s on s.empId = e.empId
and date(date_trunc('MONTH', s.SalesDate)) = mths.mth
group by 1, 2, 3
order by 3, 1
See live demo.
If you want a zero instead of a null
for total amounts where there is no sale for the month for an employee, use
coalesce(sum(s.amount), 0) total_amount