I have table categories
with columns category_id
and name
. I want to generate new table products
with random data. Table products
has columns product_id, category_id, name, price
. I want to give to each product random category_id
.
My generate script for table products
:
INSERT INTO products (category_id, name, price)
SELECT
(SELECT category_id FROM categories OFFSET floor(random()*50) LIMIT 1),
('Product ' || i),
round(cast(random() * 999 + 1 as numeric), 2)
FROM generate_series(1,100) as seq(i)
This script generate 100 rows of products, but each product has the same category_id
. What I am doing wrong?
The "problem" is that Postgres is too aggressive about optimization. It is able to recognize that the subquery could be called just once -- somehow ignoring the fact that random()
is not a deterministic function.
One method to fix this involves a correlated subquery:
SELECT (SELECT category_id
FROM categories c
WHERE c.category_id <> seq.i
OFFSET floor(random()*3)
LIMIT 1
),
('Product ' || i),
round(cast(random() * 999 + 1 as numeric), 2)
FROM generate_series(1,100) as seq(i);