Search code examples
postgresqlrandomsql-order-bysql-limitdistinct-on

Postgresql update 200 random records


I've got a query returning 200 random distinct foo from my Database. I want to update the notes field for those rows.

select distinct on (foo, random()) *
from test
order by random(), foo
limit 200
update test
notes = 'Flag'
order by random()
limit 200

My first thought was to use order and limit inside update, but those don't work there. Another option I thought was to use the select inside the update.

But I'm not really sure, that's why I came here. Any ideas to get sql to update 200 distinct random rows would be great.


Solution

  • Assuming foo is your primary key, you could do a CTE and a semi-join:

    with cte as (
      select foo
      from test 
      order by random()
      limit 200
    )
    update test t
    set notes = 'Flag'
    where exists (
      select null
      from cte c
      where t.foo = c.foo
    )
    

    This would be pretty efficient for large datasets.

    You can also try this, which is a little simpler to follow but is less scalable in terms of performance. For 200 rows, it should be fine.

    update test f1
    set notes = 'Flag'
    where foo in (select foo from test order by random() limit 200)
    

    If foo is not the primary key, just use whatever is. If you don't have a PK, you can use all columns, but there is no guarantee you would only update 200 if that happens.