Search code examples
sqlrandomdistinct-valuesnewid

Preventing duplicates with random rows and duplicates


I have a dataset where I need to randomly select x number of rows that meet certain criteria, x number of rows that meet other criteria, etc., and outputs the results all in one table. I've been using the following:

SELECT * FROM (SELECT TOP 1000 * FROM dbo.[Client List]
WHERE Source = 'HC' AND Service = 'Service1' AND Provider = 'ProviderName1'
ORDER BY NEWID()) Table1

UNION ALL

SELECT * FROM (SELECT TOP 500 * FROM dbo.[Client List]
WHERE Source = 'HC' AND Service = 'Service2' AND Provider = 'ProviderName2'
ORDER BY NEWID()) Table2

UNION ALL

SELECT * FROM (SELECT TOP 2200 * FROM dbo.[Client List]
WHERE Source = 'BA' AND Service = 'Service3' AND Provider = 'ProviderName3'
ORDER BY NEWID()) Table3

This works, but there's one problem: It's allowing duplicates of the client identifier (dbo.[Client List].[ClientID] to be selected. All of my results must be unique clients.

In other words, it must do the first random select, then do the second random select without being able to select any of the ClientIDs selected in the first select, and so on. (Yes, I realize that this is not technically "random".) Is there a way I can add some sort of code in the WHERE statement of each subsequent SELECT, or do I need to rethink the whole structure of the code? Thanks!


Solution

  • How about something like this?

    with Service1 as
    (
        SELECT TOP 1000 * 
        FROM dbo.[Client List]
        WHERE Source = 'HC' 
            AND Service = 'Service1' 
            AND Provider = 'ProviderName1'
        ORDER BY NEWID()
    )
    , Service2 as
    (
        SELECT TOP 500 * 
        FROM dbo.[Client List]
        WHERE Source = 'HC' 
            AND Service = 'Service2' 
            AND Provider = 'ProviderName2'
            AND ClientID not in (select ClientID from Service1)
        ORDER BY NEWID()
    )
    , Service3 as
    (
        SELECT TOP 2200 * 
        FROM dbo.[Client List]
        WHERE Source = 'BA' 
            AND Service = 'Service3' 
            AND Provider = 'ProviderName3'
            AND ClientID not in (select ClientID from Service1)
            AND ClientID not in (select ClientID from Service2)
        ORDER BY NEWID()
    )
    
    select * 
    from Service1
    
    union all
    
    select *
    from Service2
    
    union all
    
    select *
    from Service3