Search code examples
sql-servert-sql

How to filter based on first occurrence in my table using SQL


I am using SQL Server.

Here is my table:

RepID  ICD  Diagnosed  RandStat
-------------------------------
1001   A    1          11
1002   A    0          11
1003   A    0          11
1005   A    1          11

1003   A    1          22
1005   A    0          22
1008   A    1          22
1009   A    0          22

1001   A    0          33
1004   A    1          33
1005   A    0          33
1007   A    0          33

1003   A    0          44
1004   A    0          44
1005   A    1          44
1007   A    1          44


1001   B    0          11
1004   B    1          11
1006   B    1          11
1008   B    1          11

1004   B    0          22
1006   B    1          22
1008   B    0          22
1009   B    1          22

1001   B    0          33
1002   B    1          33
1003   B    1          33
1006   B    0          33

1003   B    1          44
1004   B    0          44
1005   B    0          44
1007   B    1          44

I want to filter out the repeated RepIDs for the same ICD based on the first RandStat (ordered by RandStat).

So if RepID appeared on RandStat 11, then it should be removed from RandStat 22 or above (all for the same ICD)

The final results should look like this:

RepID  ICD  Diagnosed  RandStat
-------------------------------
1001   A    1          11
1002   A    0          11
1003   A    0          11
1005   A    1          11

1008   A    1          22
1009   A    0          22

1004   A    1          33
1007   A    0          33

1001   B    0          11
1004   B    1          11
1006   B    1          11
1008   B    1          11

1009   B    1          22

1002   B    1          33
1003   B    1          33

1005   B    0          44
1007   B    1          44

I tried this query

SELECT A.*
FROM MyTable A 
INNER JOIN MyTable B ON A.RepID = B.RepID
                     AND A.ICD = B.ICD
                     AND A.RandStat < B.RandStat

but this took me nowhere....

Any idea how to do that?


Solution

  • We can use ROW_NUMBER here:

    WITH cte AS (
        SELECT *, ROW_NUMBER() OVER (PARTITION BY ICD, RepID ORDER BY RandStat) rn
        FROM yourTable
    )
    
    SELECT RepID, ICD, Diagnosed, RandStat
    FROM cte
    WHERE rn = 1
    ORDER BY ICD, RandStat, RepID;