Search code examples
sqlselectgoogle-bigquerygreatest-n-per-groupwindow-functions

SQL Bigquery : Limiting the selection from a particular group to 10


Below is the sample table

Currently the table has unlimited entries for each ID.

My requirement is, to first sort the IDs in the ascending order of rand. Then take only the first 2 rows ['ID' and 'companies'.]

CREATE TABLE table_name (
    ID int,
    companies varchar(255),
    rand float(2)
);

INSERT INTO table_name VALUES (1, 'a', 0.2);
INSERT INTO table_name VALUES (1, 'b', 0.6);
INSERT INTO table_name VALUES (2, 'a', 0.4);
INSERT INTO table_name VALUES (2, 'b', 0.5);
INSERT INTO table_name VALUES (2, 'c', 0.3);
INSERT INTO table_name VALUES (3, 'a', 0.6);
INSERT INTO table_name VALUES (3, 'b', 0.7);
INSERT INTO table_name VALUES (3, 'c', 0.4);
INSERT INTO table_name VALUES (3, 'd', 0.2);

i.e for the final table should contain only at max 2 rows per ID. (need not contain rand column)


Solution

  • Do you want row_number()?

    select * except(rn)
    from (
        select t.*, row_number() over(partition by id order by rand) rn
        from table_name t
    ) t
    where rn <= 2
    

    This selects maximum two records per id, which have the samllest rand; you can arrange the order by clause of row_number() to your actual sorting criteria.