sqlpostgresqlstatisticsprobability

# Select random row from a PostgreSQL table with weighted row probabilities

Example input:

```SELECT * FROM test;
id | percent
----+----------
1 | 50
2 | 35
3 | 15
(3 rows)
```

How would you write such query, that on average 50% of time i could get the row with id=1, 35% of time row with id=2, and 15% of time row with id=3?

I tried something like `SELECT id FROM test ORDER BY p * random() DESC LIMIT 1`, but it gives wrong results. After 10,000 runs I get a distribution like: `{1=6293, 2=3302, 3=405}`, but I expected the distribution to be nearly: `{1=5000, 2=3500, 3=1500}`.

Any ideas?

Solution

• This should do the trick:

``````WITH CTE AS (
SELECT random() * (SELECT SUM(percent) FROM YOUR_TABLE) R
)
SELECT *
FROM (
SELECT id, SUM(percent) OVER (ORDER BY id) S, R
FROM YOUR_TABLE CROSS JOIN CTE
) Q
WHERE S >= R
ORDER BY id
LIMIT 1;
``````

The sub-query `Q` gives the following result:

``````1  50
2  85
3  100
``````

We then simply generate a random number in range [0, 100) and pick the first row that is at or beyond that number (the `WHERE` clause). We use common table expression (`WITH`) to ensure the random number is calculated only once.

BTW, the `SELECT SUM(percent) FROM YOUR_TABLE` allows you to have any weights in `percent` - they don't strictly need to be percentages (i.e. add-up to 100).

[SQL Fiddle]