Search code examples
mysqlrandomselectionweighted

selecting weighted random distribution from a mysql table


i'm trying to write a query that will select a number of articles from a table randomly, but the articles have weighted chances to be chosen. i've come up with a solution, but it seems clumsy to me, and i was wondering if anyone has any ideas on how to do it better. i need at least 1 article, but it would be helpful if the query returned several at once.

here is my approach :

the table --

mysql> describe randomiser;
+---------+----------------------+------+-----+---------+----------------+
| Field   | Type                 | Null | Key | Default | Extra          |
+---------+----------------------+------+-----+---------+----------------+
| id      | int(10) unsigned     | NO   | PRI | NULL    | auto_increment |
| article | varchar(30)          | YES  |     | NULL    |                |
| chance  | smallint(5) unsigned | NO   | MUL | 1       |                |
| low     | int(10) unsigned     | NO   | MUL | 0       |                |
| high    | int(10) unsigned     | NO   |     | 0       |                |
+---------+----------------------+------+-----+---------+----------------+

my test population --

mysql> select * from randomiser;
+----+-------------+--------+-----+------+
| id | article     | chance | low | high |
+----+-------------+--------+-----+------+
|  1 | common      |    128 |   1 |  128 |
|  2 | uncommon    |     64 | 129 |  192 |
|  3 | infrequent1 |     32 | 193 |  224 |
|  4 | infrequent2 |     32 | 225 |  256 |
|  5 | infrequent3 |     32 | 257 |  288 |
+----+-------------+--------+-----+------+

the low and high values are updated on insert, any time someone adds a new article to the table.

my selection method --

SET @t:=(SELECT FLOOR( SUM(chance) * RAND() + 1) FROM randomiser);
SELECT article FROM randomiser WHERE @t >= low AND @t <= high;
  1. is it possible to combine the selects into a single, efficient statement ?
  2. is it possible to write a select that will pull several random values instead of only one ?

NOTE - i am not at all attached to the table as i've defined it ; if there's a different sort of layout so it will be more efficient, i would like to know !


Solution

  • For one query, you can do it like this:

    SELECT article
    FROM randomiser
    WHERE (SELECT FLOOR( SUM(chance) * RAND() + 1) FROM randomiser) BETWEEN low AND high;
    

    SQLFiddle Demo

    Or use INNER JOIN:

    SELECT article, `range`
    FROM randomiser
    INNER JOIN (
        SELECT
            FLOOR( SUM(chance) * RAND() + 1) AS `range`
        FROM randomiser
    ) t
    WHERE `range` >= low AND `range` <= high;
    

    SQLFiddle Demo