Example input:
SELECT * FROM test; id | percent ----+---------- 1 | 50 2 | 35 3 | 15 (3 rows)
How would you write such query, that on average 50% of time i could get the row with id=1, 35% of time row with id=2, and 15% of time row with id=3?
I tried something like SELECT id FROM test ORDER BY p * random() DESC LIMIT 1
, but it gives wrong results. After 10,000 runs I get a distribution like: {1=6293, 2=3302, 3=405}
, but I expected the distribution to be nearly: {1=5000, 2=3500, 3=1500}
.
Any ideas?
This should do the trick:
WITH CTE AS (
SELECT random() * (SELECT SUM(percent) FROM YOUR_TABLE) R
)
SELECT *
FROM (
SELECT id, SUM(percent) OVER (ORDER BY id) S, R
FROM YOUR_TABLE CROSS JOIN CTE
) Q
WHERE S >= R
ORDER BY id
LIMIT 1;
The sub-query Q
gives the following result:
1 50
2 85
3 100
We then simply generate a random number in range [0, 100) and pick the first row that is at or beyond that number (the WHERE
clause). We use common table expression (WITH
) to ensure the random number is calculated only once.
BTW, the SELECT SUM(percent) FROM YOUR_TABLE
allows you to have any weights in percent
- they don't strictly need to be percentages (i.e. add-up to 100).