Search code examples
sqlpostgresqlgenerate-series

How to generate random data on an associated table in postgres?


I have two tables: users and results. A user has many results.

I need to generate 10 million records in our users table and I was able to do this using the generate_series function in postgres.

Now, I want to generate millions of rows in the results table. But I want there to be a certain distribution of these rows. For example, I want 50% of the users to only have 1 result. I want 40% to have 2 results. And I want 10% to have 5 results.

Is there a way to generate this random data in the results table in postgres?


Solution

  • Yes:

    select u.user_id, gs.result
    from (select u.*,
                 ntile(10) over (order by rand()) as decile
          from users u
         ) u cross join lateral
         generate_series(1, (case when u.decile <= 5 then 1 when u.decile <= 9 then 2 else 5 end)) gs(result);
    

    This generates the rows. You can fill in with the data you want.