Search code examples
sqlsql-serversql-server-2014

SQL Server - Divide a dataset to same size groups with random rows


Have a base dataset which is a set of ID's, F.E.

000
111
222
333
444
555
666
777
888
999

and a keys dataset

1 - 20
2 - 40
3 - 40

The result should be a random pairing of the base dataset with the keys dataset according to the defined percentage, meaning 20% will recieve 1, 40% will recieve 2 and 40% will recieve 3

F.E. - first run

000 - 1
111 - 2
222 - 3
333 - 1
444 - 2
555 - 3
666 - 2
777 - 3
888 - 2
999 - 3

second run

000 - 2
111 - 3
222 - 2
333 - 3
444 - 2
555 - 3
666 - 1
777 - 3
888 - 2
999 - 1

etc.

Using SQL server 2014

Thought of solutions that involve loops and cusrors and temp tables but I was wondering if there is a "cleaner" and most probably more efficient solution...

Any ideas?


Solution

  • Here is one way using Numbers table.

    ;WITH base_dataset
         AS (SELECT *,
                    Row_number()OVER(ORDER BY id) AS rn
             FROM   (VALUES (000),
                            (111),
                            (222),
                            (333),
                            (444),
                            (555),
                            (666),
                            (777),
                            (888),
                            (999)) tc (ID)),
         keys
         AS (SELECT *
             FROM   (VALUES (1,20),
                            (2,50),
                            (3,30)) tc(val, per)),
         num_gen
         AS (SELECT 1        AS num,
                    Count(1) AS cnt
             FROM   base_dataset
             UNION ALL
             SELECT num + 1,
                    cnt
             FROM   num_gen
             WHERE  num < cnt)
    SELECT Id,val
    FROM   (SELECT Row_number()OVER(ORDER BY newid()) rn,
                   val
            FROM   num_gen n
                   JOIN keys k
                     ON n.num <= (k.per/100.0) * cnt) a
           JOIN base_dataset d
             ON d.rn = a.rn 
    

    I have used Recursive CTE to generate numbers you can create a numbers table in database and use it