Search code examples
sqlalgorithmsortingrandomprobability

Infinite scroll algorithm for random items with different weight ( probability to show to the user )


I have a web / mobile application that should display an infinite scroll view (the continuation of the list of items is loaded periodically in a dynamic way) with items where each of the items have a weight, the bigger is the weight in comparison to the weights of other items the higher should be the chances/probability to load the item and display it in the list for the users, the items should be loaded randomly, just the chances for the items to be in the list should be different.

I am searching for an efficient algorithm / solution or at least hints that would help me achieve that.

Some points worth to mention:

  • the weight has those boundaries: 0 <= w < infinite.
  • the weight is not a static value, it can change over time based on some item properties.
  • every item with a weight higher than 0 should have a chance to be displayed to the user even if the weight is significantly lower than the weight of other items.
  • when the users scrolls and performs multiple requests to API, he/she should not see duplicate items or at least the chance should be low.
  • I use a SQL Database (PostgreSQL) for storing items so the solution should be efficient for this type of database. (It shouldn't be a purely SQL solution)

Hope I didn't miss anything important. Let me know if I did.


Solution

  • The following are some ideas to implement the solution:

    The database table should have a column where each entry is a number generated as follows:

    • log(R) / W,

    where—

    • W is the record's weight greater than 0 (itself its own column), and
    • R is a per-record uniform random number in (0, 1)

    (see also Arratia, R., "On the amount of dependence in the prime factorization of a uniform random integer", 2002). Then take the records with the highest values of that column as the need arises.

    However, note that SQL has no standard way to generate random numbers; DBMSs that implement SQL have their own ways to do so (such as RANDOM() for PostgreSQL), but how they work depends on the DBMS (for example, compare MySQL's RAND() with T-SQL's NEWID()).

    EDIT (Mar. 5, 2023): Note that for this solution, the weight expresses how likely a given item will appear first in the sample. This weight is not necessarily the chance that a given sample of n items will include that item (that is, an inclusion probability). The solution given above will not necessarily ensure that a given item will appear in a random sample with probability proportional to its weight; for that, see "Algorithms of sampling with equal or unequal probabilities".