Search code examples
sqlsql-serverjoinmaxaggregate-functions

SQL Server: select MAX Value when LEFT JOIN and then populate the column in the table


I have two tables Response and Distributions:

Table structure with expected output

In this case, there are multiple responses but 1 distribution. We need to tie just one latest response before the assignment date to the distribution, basically

MAX(COALESCE(RESPONSE_DATE, CREATED_DATE)) <= ASSIGNMENT_DATE

The SQL query I tried:

SELECT 
    resp.CONTACT_ID, resp.RESPONSE_ID, resp.RESPONSE_DATE, 
    resp.CREATED_DATE, d.ASSIGNMENT_DATE AS DISTRIBUTION_DATE
FROM 
    Response resp
LEFT JOIN 
    Distribution d ON resp.CONTACT_ID = d.CONTACT_ID
-- 12 Hour Grace Period For assignments created before response
                   AND DATEADD(hour, -12, COALESCE(resp.RESPONSE_DATE, resp.CREATED_DATE)) <= d.ASSIGNMENT_DATE

This query returns DISTRIBUTION_DATE as 2020-10-28 for first two rows when.

This condition

DATEADD(hour, -12, COALESCE(resp.RESPONSE_DATE, resp.CREATED_DATE)) <= ASSIGNMENT_DATE

is satisfied (see SQL query returning wrong output table in the attached screenshot).

However, I want DISTRIBUTION_DATE as "2020-10-28" only for the second row as shown in the expected output. The reason is the latest response just before the assignment date will get distributed and I don't care about initial X responses (We should tie one distribution to only one latest response)

I tried to use

MAX(DATEADD(hour, -12, COALESCE(resp.RESPONSE_DATE, resp.CREATED_DATE))) <= ASSIGNMENT_DATE 

in the JOIN ON condition but that doesn't work in SQL.

Please let me know how to structure the query and get the expected output.

NOTE: The join from Response to Distribution has to be on CONTACT_ID, there is no explicit JOIN_KEY and it can lead to 1:M joins as we have same CONTACT_ID (that's the reason a lot of filtering is done in join ON condition), ideal scenario is to have RESPONSE_ID on Distribution table as well but that's not how the data is structured.

TIA


Solution

  • If I understand what you require, you'll need to run a query like this to get the expected results:

    SELECT CONTACT_ID, RESPONSE_ID, RESPONSE_DATE, CREATED_DATE, CASE WHEN RANKING = 1 THEN DISTRIBUTION_DATE ELSE NULL END AS DISTRIBUTION_DATE
    FROM (
        SELECT 
            resp.CONTACT_ID, resp.RESPONSE_ID, resp.RESPONSE_DATE, 
            resp.CREATED_DATE, d.ASSIGNMENT_DATE AS DISTRIBUTION_DATE,
            DATEADD(hour, -12, COALESCE(resp.RESPONSE_DATE, resp.CREATED_DATE)) AS GRACE_DATE,
            RANK() OVER (
            PARTITION BY resp.CONTACT_ID
            ORDER BY 
                CASE 
                    WHEN DATEADD(hour, -12, COALESCE(resp.RESPONSE_DATE, resp.CREATED_DATE)) <= d.ASSIGNMENT_DATE
                        THEN DATEADD(hour, -12, COALESCE(resp.RESPONSE_DATE, resp.CREATED_DATE))
                    ELSE 
                        '19000101'
                END
            DESC ) AS RANKING
        FROM 
            Response resp
        LEFT JOIN 
            [Distribution] d ON resp.CONTACT_ID = d.CONTACT_ID
    ) DerivedTable
    ORDER BY CONTACT_ID, RESPONSE_ID
    

    Basically, I use a rank to workout the closest date based on the assignment date, and then know that the ranking where it is 1 should be the one to show the distribution date.