Search code examples
phpmysqlsqlrandomsimilarity

SELECTing two random rows in an SQL table where the values of a column come very close


I ran into a little problem. I am creating a web app with code snippets. Users are confronted with two random code snippets which are tagged similarly and they must choose which of the two they like the most, or they can skip (similar to Facemash, but with code snippets instead of pictures).

I have table called CodeSnippets with these fields:

  • Id — ID of the snippet
  • Score — Score of the snippet
  • Some more non-important columns

Score is a real number.

My question: how can I write a query that gets two random records with scores that are very close? The problem is that I have to order trice: RAND(), Score and RAND() with a margin.

This is what I already have:

SELECT Id, Score, Code FROM CodeSnippets ORDER BY RAND() LIMIT 2

But this simply picks two random records, even if their scores lie far away from eachother :)

Could anyone point me in the right direction?


Solution

  • Can you do it with a subquery? First have a self-join query to select all pairs with score within a threshold, then take the first random pair:

    SELECT TOP 1 Id1, Id2
    FROM (
      SELECT C1.Id AS Id1, C2.Id AS Id2 FROM CodeSnippets C1, CodeSnippet C2
      WHERE C1.Id <> C2.Id AND ABS(C1.Score-C2.Score) < [Threshold]
    )
    ORDER BY RAND()
    

    Be careful because not all database engines support sub-queries (although most should). Most engines will also optimize/rewrite sub-queries to run fast, but some embedded database engines may just run the sub-query, create a temp table in memory, then run the outer query over the temp table, resulting in poor performance. If that is your case, you'll need to rewrite this query using joins.