Inserting dummy data into Postgres table

I have got the following Postgres table:

create table test (
    id serial, 
    contract varchar, 
    amount0 int, 
    amount1 int, 
    price double precision
);

I would like to insert 100 rows of dummy data that conforms to the following:

– In column 'contract' there should be values out of 'abc', 'klm' and 'xyz'.

– In columns 'amount0' and 'amount1' there should be integer values out of 50, 60, 80, 100, 200.

– In column 'price' there should be values out of 1.5, 1.8, 2.1, 2.5.

What I have come up with so far is this:

INSERT INTO test (amount0, amount1, price)
SELECT
  (SELECT val FROM unnest(ARRAY[50, 60, 80, 100, 200]) AS val ORDER BY random() LIMIT 1),
  (SELECT val FROM unnest(ARRAY[50, 60, 80, 100, 200]) AS val ORDER BY random() LIMIT 1),
  (SELECT val FROM unnest(ARRAY[1.5, 1.8, 2.1, 2.5]) AS val ORDER BY random() LIMIT 1)
FROM generate_series(1, 100);

But this doesn't do the trick. This statement always only uses '100' for column 'amount0', '50' for column 'amount1' and '2.1' for column 'price'.

Can anyone help me out?

Solution

You can randomly subscript the array of choices: (demo)

INSERT INTO test (contract,amount0, amount1, price)
SELECT (ARRAY['abc','klm','xyz']   )[(random()*2+1)::int],
       (ARRAY[50, 60, 80, 100, 200])[(random()*4+1)::int],
       (ARRAY[50, 60, 80, 100, 200])[(random()*4+1)::int],
       (ARRAY[1.5, 1.8, 2.1, 2.5]  )[(random()*3+1)::int]
FROM generate_series(1, 1e2, 1);

SELECT id, contract,amount0, amount1, price FROM test LIMIT 6;

id	contract	amount0	amount1	price
1	klm	50	100	2.1
2	xyz	100	100	1.5
3	abc	80	100	1.8
4	xyz	80	80	2.1
5	xyz	200	50	1.8
6	xyz	60	60	1.8

In PostgreSQL 16+, there's also array_sample(). Its intent is clearer, it looks a bit better, lets you sample multiple, non-consecutive elements. A potential downside is you can't make it repeatable with setseed(): (demo2)

INSERT INTO test (contract,amount0, amount1, price)
SELECT (array_sample(ARRAY['abc','klm','xyz']   ,1))[1],
       (array_sample(ARRAY[50, 60, 80, 100, 200],1))[1],
       (array_sample(ARRAY[50, 60, 80, 100, 200],1))[1],
       (array_sample(ARRAY[1.5, 1.8, 2.1, 2.5]  ,1))[1]
FROM generate_series(1, 1e2, 1);

Note that this does what you intended to do, making each selection independently random. In effect, it's quite likely that you'll get duplicate entries. If you prefer to keep getting unique combinations until you run out, and only then start over, you can do something similar to Mike's suggestion: (demo2)

INSERT INTO test (contract,amount0, amount1, price)
SELECT contract,amount0, amount1, price
FROM (values (50),(60),(80),(100),(200)) a(amount0),
     (values (50),(60),(80),(100),(200)) b(amount1),
     (values (1.5), (1.8), (2.1), (2.5)) c(price),
     (values ('abc'),('klm'),('xyz')   ) d(contract),
     generate_series(1, 1e2, 1) duplicator(n)
order by n, random()
limit 100;

Here's a function if you find yourself in need of a weighted random, meaning that you want to define exactly how likely an option is, compared to the others.

The reason why your solution didn't work is that since your scalar subqueries don't depend in any way on the outer query, they were evaluated only once and re-used. You can check the plan by running it with explain analyze verbose. You could trick the planner into thinking they somehow rely on the outer query by adding an outside reference, even if it does nothing (demo3) but the above does the same, in less code.

id	contract	amount0	amount1	price
1	klm	50	100	2.1
2	xyz	100	100	1.5
3	abc	80	100	1.8
4	xyz	80	80	2.1
5	xyz	200	50	1.8
6	xyz	60	60	1.8

id	contract	amount0	amount1	price
1	klm	50	100	2.1
2	xyz	100	100	1.5
3	abc	80	100	1.8
4	xyz	80	80	2.1
5	xyz	200	50	1.8
6	xyz	60	60	1.8

id	contract	amount0	amount1	price
1	klm	50	100	2.1
2	xyz	100	100	1.5
3	abc	80	100	1.8
4	xyz	80	80	2.1
5	xyz	200	50	1.8
6	xyz	60	60	1.8