Search code examples
sqlsql-serverjoinduplicateshaving

SQL - Removing duplicates, selecting most recent using HAVING clause & an NTEXT column


My query returns customer responses. A response table (rp) is joined to the customer table (ah). The query returns the results I want.

However it returns more than 1 response per customer if they have multiple active responses recorded.

So I tried to limit it by adding a HAVING clause yet I can't group with the responsefreetxt due to it being NTEXT.

SELECT 
    ah.CUSTACCOUNT,         /--- type = CHAR
    rp.RESPONSEFREETXT      /--- type = NTEXT
FROM 
    ah
INNER JOIN 
    rp ON rp.CUSTACCOUNT = ah.CUSTACCOUNT
    ...
WHERE 
    ah.ACTIVE = 1
    AND rp.ACTIVE = 1
    AND ...
GROUP BY 
    ah.CUSTACCOUNT, rp.RESPONSEFREETXT
HAVING 
    COUNT(ah.CUSTACCOUNT) = 1
ORDER BY 
    rp.date DESC

Results in this error:

Msg 306, Level 16, State 2, Line 21
The text, ntext, and image data types cannot be compared or sorted, except when using IS NULL or LIKE operator.

Ideally I want to return 1 row per customer and in cases of duplicates, only return the most recent response.

How can I do so?

Thanks!


Solution

  • One way to handle this from SQL Server is to use ROW_NUMBER:

    WITH cte AS (
        SELECT ah.CUSTACCOUNT, rp.RESPONSEFREETXT,
               ROW_NUMBER() OVER (PARTITION BY ah.CUSTACCOUNT
                                  ORDER BY rp.SOME_DATE DESC) rn
        FROM ah
        INNER JOIN rp ON rp.CUSTACCOUNT = ah.CUSTACCOUNT
        ...    
        WHERE ah.ACTIVE = 1 AND rp.ACTIVE = 1 AND ...
        -- I have removed the GROUP BY clause
    )
    
    SELECT *
    FROM cte
    WHERE rn = 1;
    

    This assumes that there exist a column called SOME_DATE in the responses table which keeps track of the timestamp of each customer response. Another way of writing the above query would use TOP 1 WITH TIES:

    SELECT TOP 1 WITH TIES ah.CUSTACCOUNT, rp.RESPONSEFREETXT
    FROM ah
    INNER JOIN rp ON rp.CUSTACCOUNT = ah.CUSTACCOUNT
    ...    
    WHERE ah.ACTIVE = 1 AND rp.ACTIVE = 1 AND ...
    ORDER BY ROW_NUMBER() OVER (PARTITION BY ah.CUSTACCOUNT
                                ORDER BY rp.SOME_DATE DESC);