Search code examples
mysqlrandom

Join two tables in MySQL, selecting random rows from the first one


I have two tables, table 2 has one row per variable, while table 1 has many rows per variable. I want to randomly select 1000 rows from table 2 (1000 variables) and then join those to table 1. Final table will have many rows per variable. I'm also adding conditions for the selection from both tables. This is my code but it's not picking 1000 variables from table 2:

SELECT t1.var1, t1.var2, t2.var3, t2.var4
FROM table1 t1
RIGHT JOIN (SELECT var1, var3 FROM table2 ORDER BY RAND() LIMIT 1000) t2 
ON t1.var1 = t2.var2
WHERE t1.var2 < 90 AND t2.var4 = "asd";

What am I missing? Also, I've tested this putting a limit of 10 variables, and sometimes I get zero rows...


Solution

  • Your query could easily filter out all results.

    SELECT t1.var1, t1.var2, t2.var3, t2.var4
    FROM table1 t1
    RIGHT JOIN (SELECT var1, var3 FROM table2 ORDER BY RAND() LIMIT 1000) t2 
    ON t1.var1 = t2.var2
    WHERE t1.var2 < 90 AND t2.var4 = "asd";
    

    This selects the 1000 random rows before testing for the rows where t2.var4 = 'asd'. I assume not all rows match that condition. So it's possible that none of the 1000 randomly selected rows match that condition.

    Also you are testing t1.var2 < 90 after the outer join, which means only rows of t1 that have at least one corresponding row of t2 matching that range condition will be included. This also creates the possibility that the query returns no results, because not all rows of t1 have such corresponding rows of t2.

    You should be able to fix both problems.

    SELECT t1.var1, t1.var2, t2.var3, t2.var4
    FROM table1 t1
    RIGHT JOIN (
      SELECT var1, var3 FROM table2 
      WHERE var4 = 'asd' ORDER BY RAND() LIMIT 1000
    ) t2 
    ON t1.var1 = t2.var2 AND t1.var2 < 90;
    

    This puts the condition on t2.var4 inside the subquery, so the subquery returns 1000 randomly selected rows from the subset of rows that match.

    Also putting the condition on t1.var2 into the join condition means that rows of t2 will be returned even if they they have no corresponding rows of t1 that match that condition.