Search code examples
mysqlsqldatabasemysql-variables

select random value based on probability chance


How do I select a random row from the database based on the probability chance assigned to each row.
Example:

Make        Chance  Value
ALFA ROMEO  0.0024  20000
AUDI        0.0338  35000
BMW         0.0376  40000
CHEVROLET   0.0087  15000
CITROEN     0.016   15000
........

How do I select random make name and its value based on the probability it has to be chosen.

Would a combination of rand() and ORDER BY work? If so what is the best way to do this?


Solution

  • You can do this by using rand() and then using a cumulative sum. Assuming they add up to 100%:

    select t.*
    from (select t.*, (@cumep := @cumep + chance) as cumep
          from t cross join
               (select @cumep := 0, @r := rand()) params
         ) t
    where @r between cumep - chance and cumep
    limit 1;
    

    Notes:

    • rand() is called once in a subquery to initialize a variable. Multiple calls to rand() are not desirable.
    • There is a remote chance that the random number will be exactly on the boundary between two values. The limit 1 arbitrarily chooses 1.
    • This could be made more efficient by stopping the subquery when cumep > @r.
    • The values do not have to be in any particular order.
    • This can be modified to handle chances where the sum is not equal to 1, but that would be another question.