Search code examples
sql-serverrandomsqlperformance

Get random data from SQL Server without performance impact


I need to select random rows from my sql table, when search this cases in google, they suggested to ORDER BY NEWID() but it reduces the performance. Since my table has more than 2'000'000 rows of data, this solution does not suit me.

I tried this code to get random data :

SELECT TOP 10 * 
FROM Table1
WHERE (ABS(CAST((BINARY_CHECKSUM(*) * RAND()) AS INT)) % 100) < 10 

It also drops performance sometimes.

Could you please suggest good solution for getting random data from my table, I need minimum rows from that tables like 30 rows for each request. I tried TableSAMPLE to get the data, but it returns nothing once I added my where condition because it return the data by the basis of page not basis of row.


Solution

  • Try to calc the random ids before to filter your big table.
    since your key is not identity, you need to number records and this will affect performances..

    Pay attention, I have used distinct clause to be sure to get different numbers

    EDIT: I have modified the query to use an arbitrary filter on your big table

    declare @n int = 30
    
    ;with
    t as (
        -- EXTRACT DATA AND NUMBER ROWS
        select *, ROW_NUMBER() over (order by YourPrimaryKey) n
        from YourBigTable t
        -- SOME FILTER
        WHERE 1=1  /* <-- PUT HERE YOUR COMPLEX FILTER LOGIC */
    ),
    r as (
        -- RANDOM NUMBERS BETWEEN 1 AND COUNT(*) OF FILTERED TABLE
        select distinct top (@n) abs(CHECKSUM(NEWID()) % n)+1  rnd
        from sysobjects s
        cross join (SELECT MAX(n) n FROM t) t
    )
    select t.*
    from t
    join r on r.rnd =  t.n