Search code examples
joindatabricks-sql

Show all rows from employees master table even when applying filters after joining with Sales Table


I have two tables Employee Master and Sales Fact with below structurs

Employee master:

empId empName
1 Jack
2 John
3 Luke

Sales Fact:

empId SalesDate Amount
1 2023-10-12 100
2 2023-10-23 200
1 2023-11-12 100
3 2023-11-23 200
2 2023-12-12 100
3 2023-12-23 200

When I join these two tables and filter for a particular month, the employees who have not made any sales on that month are not shown. (Eg: When I filter for October 2023 only 1 and 2 are shown, when I filter for November 2023, only 1 and 3 are shown)

But my requirement is that I need to create a view to show all the employees for each month regardless of whether they have made a sale or not. (Eg: When I filter for October 2023 then I want all 3 employees and the amount should be NULLfor empId 3)

My current solution is to join these two tables every month and insert the data into a new table having one extra MonthYear column which is what I use for filter. But I have to run this monthly which is what I want to avoid

Note: Recursive CTEs are not avaliable in my databricks workspace

Here are the codes to create and insert values for these tables:

-- Create the table
CREATE TABLE employeeMaster (
    empId INT,
    empName STRING
);

-- Insert the values into the table
INSERT INTO employees VALUES
(1, 'Jack'),
(2, 'John'),
(3, 'Luke');

-- Create the table
CREATE TABLE sales (
    empId INT,
    SalesDate DATE,
    Amount INT
);

-- Insert the values into the table
INSERT INTO salesFact VALUES
(1, '2023-10-12', 100),
(2, '2023-10-23', 200),
(1, '2023-11-12', 100),
(3, '2023-11-23', 200),
(2, '2023-12-12', 100),
(3, '2023-12-23', 200);

Solution

  • Use a subquery to return all the months (for which there is at least 1 sale), then cross join to employee to get all combos of employee and months and finally left join to sales:

    select
      e.empId,
      e.empName,
      mths.mth,
      sum(s.amount) total_amount
    from (select distinct date(date_trunc('MONTH', SalesDate)) mth from salesFact) mths
    cross join employeeMaster e
    left join salesFact s on s.empId = e.empId 
      and date(date_trunc('MONTH', s.SalesDate)) = mths.mth
    group by 1, 2, 3
    order by 3, 1
    

    See live demo.


    If you want a zero instead of a null for total amounts where there is no sale for the month for an employee, use

    coalesce(sum(s.amount), 0) total_amount