Search code examples
sqlsql-server

How to remove duplicate in the same row


I have a table with column columna with data in the same row:

columna
-----------------
a,b,a,b,a,b,a,b

I tried this, but it's not good

duplicated_rows AS 
(
    SELECT 
        *,
        ROW_NUMBER() OVER (PARTITION BY columna) AS row_num
    FROM 
        tablea
)
SELECT * 
FROM duplicated_rows
WHERE row_num > 1;

Not sure how to remove duplicate so I will get result

columna
-------
a,b

Solution

  • Please try the following solution based on SQL Server's XML and XQuery functionality.

    It will work starting from SQL Server 2012 onwards.

    Notable points:

    • CROSS APPLY convert column value into XML.
    • XQuery distinct-values() function does the magic.

    SQL

    DECLARE @tbl TABLE (id INT IDENTITY PRIMARY KEY, tokens VARCHAR(1000));
    INSERT INTO @tbl (tokens) VALUES 
    ('a,b,a,b,a,b,a,b'),
    ('cat,dog,cat');
    
    DECLARE @separator CHAR(1) = ',';
    
    SELECT t.*, c
       , REPLACE(c.query('distinct-values(/root/r)').value('.','VARCHAR(1000)'), SPACE(1), @separator) AS result
    FROM @tbl AS t
    CROSS APPLY (SELECT TRY_CAST('<root><r><![CDATA[' + 
        REPLACE(tokens, @separator, ']]></r><r><![CDATA[') + 
        ']]></r></root>' AS XML)) AS t1(c);
    

    Output

    id tokens result
    1 a,b,a,b,a,b,a,b a,b
    2 cat,dog,cat cat,dog