Search code examples
sqlu-sql

U-SQL Is it accepted to use more than one FIRST_VALUE to remove duplicates in specific columns?


I have a table where multiple rows are duplicates because of two date columns which their value are different.

I want to know if it is accepted to use FIRST_VALUE in both columns like this, to remove duplicate on specified columns:

SELECT
 EmployeeName,
 FIRST_VALUE(StartDateTime) OVER(ORDER BY UpdatedDateTime DESC) AS StartDateTime,
 FIRST_VALUE(UpdatedDateTime) OVER(ORDER BY UpdatedDateTime DESC) AS UpdatedDateTime 
 FROM @Employees;

Solution

  • If you need to remove duplicates over some field, you need to use ROW_NUMBER() and CTE:

    -- Sample data: dates duplicates
    declare @t table (id int, dt date);
    insert into @t values
    (1, '2018-01-14'),
    (1, '2018-01-14'),
    (1, '2018-01-15'),
    (1, '2018-01-15');
    with cte as (
        select *,
               -- assign row number for each partition consisting of same date
               row_number() over (partition by dt order by dt) as cnt
        from @t
    )
    -- we're interested only in one row (i.e. first)
    select id, dt from cte where cnt = 1;
    
    /*
     Output:
    +-------+---------+
    | id | dt         |
    +----+------------+
    |  1 | 2018-01-14 |
    |----|------------|
    |  1 | 2018-01-15 |
    +----+------------+
    */