Search code examples
sqlpostgresqlgenerate-scripts

Get random row from table when generating new table


I have table categories with columns category_id and name. I want to generate new table products with random data. Table products has columns product_id, category_id, name, price. I want to give to each product random category_id.

My generate script for table products:

INSERT INTO products (category_id, name, price) 
SELECT
   (SELECT category_id FROM categories OFFSET floor(random()*50) LIMIT 1),
   ('Product ' || i),
   round(cast(random() * 999 + 1 as numeric), 2)
FROM generate_series(1,100) as seq(i)

This script generate 100 rows of products, but each product has the same category_id. What I am doing wrong?


Solution

  • The "problem" is that Postgres is too aggressive about optimization. It is able to recognize that the subquery could be called just once -- somehow ignoring the fact that random() is not a deterministic function.

    One method to fix this involves a correlated subquery:

    SELECT (SELECT category_id
            FROM categories c
            WHERE c.category_id <> seq.i
            OFFSET floor(random()*3)
            LIMIT 1
           ),
           ('Product ' || i),
           round(cast(random() * 999 + 1 as numeric), 2)
    FROM generate_series(1,100) as seq(i);