I have a big table in my database (potentially millions of records) and I need to select #X random rows (let's say #X between 10 and 50) , but I need this query to be as optimal as possible.
The table looks like this:
CREATE TABLE sample (
id bigint auto_increment PRIMARY KEY,
user_id bigint NOT NULL,
screen_name VARCHAR NOT NULL,
...
);
I've searched around and I found answers like this:
SELECT * FROM sample ORDER BY RAND() limit X.
But It looks to me that this will fetch the full table then sort it, isn't it?
I think it would be best to generate 10 or 50 random integers and do a select * from sample where rowid in (<random integer list>)
. But afaik, rowid concept is missing in H2, so I may opt for using the ID column in my table.
It would be awesome if I can do this task with a single SQL query.
Any better advice?
The following script selects every nth row quite efficiently. It assumes there are no gaps in the ids. If gaps are possible, then you might want to increase the range(1, 100) to range(1, 200) or so. To get random rows, the formula at the very end would need to be changed a bit:
drop table test;
create table test(
id bigint auto_increment primary key,
name varchar(255));
insert into test
select x, 'Hello ' || x from system_range(50, 1200);
select * from test t, system_range(1, 100) range
where t.id = x * (select max(id)-min(id) from test) / 100 +
(select min(id) from test);