Search code examples
sqlverticasampling

Sample observations per group without replacement in SQL


Using the provided table I would like to sample let's say 2 users per day so that users assigned to the two days are different. Of course the problem I have is more sophisticated, but this simple example gives the idea.

drop table if exists test; 

create table test (
user_id int,
day_of_week int);

insert into test values (1, 1);
insert into test values (1, 2);
insert into test values (2, 1);
insert into test values (2, 2);
insert into test values (3, 1);
insert into test values (3, 2);
insert into test values (4, 1);
insert into test values (4, 2);
insert into test values (5, 1);
insert into test values (5, 2);
insert into test values (6, 1);
insert into test values (6, 2);

The expected results would look like this:

create table results (
user_id int,
day_of_week int);

insert into results values (1, 1);
insert into results values (2, 1);
insert into results values (3, 2);
insert into results values (6, 2);

Solution

  • You can use window functions. Here is an example . . . although the details do depend on your database (functions for random numbers vary by database):

    select t.*
    from (select t.*, row_number() over (partition by day_of_week order by random()) as seqnum
          from test t
         ) t 
    where seqnum <= 2;