Search code examples
sql-serversas-jmp

select random ~100 million rows of billions


I'm a little compute-constrained because I don't want to kill the database or bandwidth while pulling for hours.

my current query is something like this:

SELECT TOP (100000000) mycolumns
FROM mydb
WHERE lower_control_limit> value1
    AND upper_control_limit < value2 ;

I am not getting a decent random sample, but I am afraid of putting commands that have to traverse billions of rows before returning values. I need a uniform random sample over the domain (rows).

What is the "best practice" for pulling a few hundred million rows from a store that is many billions of rows in size?

I've seen the random, but I'm just worried about the size of the query vs. limited computing and bandwidth resources.


Solution

  • Found this: Select n random rows from SQL Server table

     SELECT TOP (100000000) mycolumns
     FROM mydb order by newid()