Search code examples
arrayspostgresqlquery-optimizationjsonb

Query to select rows where JSONB arrays have variable lengths


I have a table (t) which contains a column (c) of JSONB objects, each containing an array (a) with a varying length between 1-10 (inclusive). I need to build a query that selects 1000 rows from t where c includes 100 random rows of each possible length of a. What would be the most concise way to write this query? My current query structure looks like this:

WITH length_1 AS (
    SELECT *
    FROM t
    WHERE JSONB_ARRAY_LENGTH(c -> 'a') = 1
    ORDER BY RANDOM()
    LIMIT 100
),
length_2 AS (
    SELECT *
    FROM t
    WHERE JSONB_ARRAY_LENGTH(c -> 'a') = 2
    ORDER BY RANDOM()
    LIMIT 100
)
...
SELECT *
FROM length_1
UNION
SELECT *
FROM length_2
...

Solution

  • You can use a window function to label each row with a row number within its partition, then use an outer select to limit to 100 for each partition.

    select * from (
        select t.*, row_number() over (partition by jsonb_array_length(c->'a') order by random()) as rn from t
    ) foo  where rn<=100;
    

    The two levels are needed because you can't use a window function in a WHERE or a HAVING.