Search code examples
randomsql-server-2012top-n

random join in SQL Server with a varying number of random results


SQL Server's SQL has become so clever that what looks like it might require a procedural solution can often be done with pure SQL. I am wondering if this is one of those times.
Let's say we have a STATES table and a CITIES table.

STATES:
State:  NY

CITIES
State: NY
City:  Armonk

Now let's complicate things with a third table: INSTRUCTIONS

INSTRUCTIONS
State: NY
HowMany: 17

State: NJ
HowMany: 11

Is there any way in SQL Server SQL to select HowMany cities at random from the CITIES table when the three tables are joined on State?

We don't know the "top N" in advance. It changes by state.

Of course, the States table will have all the 50 states, the Cities table all the cities in each state, and Instructions will have one record per state, identifying how many cities from that state are needed (chosen randomly).

P.S. Sample desired results (assuming the Instruction for NY is HowMany=5 and the Instruction for NJ is HowMany = 4, and order by STATES.state):

NJ.....Princeton
NJ.....Newark
NJ.....Camden
NJ.....Princeton
NY.....Armonk
NY.....Schenectady
NY.....White Plains
NY.....Niagara Falls
NY.....Rochester

Solution

  • I find the RAND() function used in the other answers causes some problems by not being a new random number for each row.

    CHECKSUM(NEWID()) worked well for me in this scenario. (See RAND not different for every row in T-SQL UPDATE)

    I think this solution is nice and tidy:

    SELECT
    RandomCities.[State]
    ,[RandomCities].City
    FROM
        (
            SELECT 
            s.[state]
            ,city
            ,ROW_NUMBER() OVER (PARTITION BY s.[State] ORDER BY CHECKSUM(NEWID())) AS [RandomOrder]
            FROM
            States s
            INNER JOIN Cities c ON c.[state]=s.[state]
        ) AS RandomCities
    INNER JOIN instructions i ON i.[state]=RandomCities.[state]
    WHERE RandomCities.RandomOrder<=i.HowMany