Search code examples
mysqlinner-joincross-join

How to optimize a JOIN of two tables that require lookups on the column to join on?


I am doing a simple INNER JOIN of two tables with the additional requirement that each table must first be joined with a separate 1:n table to get the correct join values. Illustrating with a simple example, I have three tables UserInstalls (UserId, InstallDate), UserConversion (UserId, ConversionDate), and UserAccounts (UserId, AccountId). I want a final result of (AccountId, InstallDate, ConversionDate).

UserInstalls
+--------+-------------+
| UserId | InstallDate |
+--------+-------------+
|      1 | 2015-01-11  |
|      2 | 2015-03-21  |
|      3 | 2015-02-05  |
|      4 | 2014-06-01  |
|      5 | 2014-01-23  |
|      6 | 2015-02-14  |
+--------+-------------+
UserConversion
+--------+----------------+
| UserId | ConversionDate |
+--------+----------------+
|      7 | 2015-01-15     |
|      8 | 2015-03-03     |
|      9 | 2015-08-01     |
|     10 | 2015-01-02     |
|     11 | 2014-01-24     |
|     12 | 2015-02-17     |
+--------+----------------+
UserAccounts
+--------+-----------+
| UserId | AccountId |
+--------+-----------+
|      1 |         1 |
|      7 |         1 |
|      2 |         2 |
|      8 |         2 |
|      3 |         3 |
|      9 |         3 |
|      4 |         4 |
|     10 |         4 |
|      5 |         5 |
|     11 |         5 |
|      6 |         6 |
|     12 |         6 |
+--------+-----------+

I can simply use subqueries to lookup the AccountID, and then join the results of each subquery. In this trivial case this works fine. In practical applications involving (10,000+ rows), this is not efficient and I run into spool limitations.

SELECT x.AccountID, InstallDate, ConversionDate
FROM (SELECT AccountID, InstallDate FROM UserInstalls
    JOIN UserAccounts
    ON UserInstalls.UserId = UserAccounts.UserId) x
JOIN (SELECT AccountID, ConversionDate FROM UserConversion 
    JOIN  UserAccounts
    UserConversion.UserId = UserAccounts.UserId) y
ON x.AccountId = y.AccountId;

Any ideas to do this without subqueries or creating multiple tables via a script? Cross-joins? Relational division?

Thanks!


Solution

  • You should be able to just join the tables together without subqueries:

    SELECT ua1.AccountID, ui.InstallDate, uc.ConversionDate
    FROM UserAccounts ua1
    JOIN UserInstalls ui on ua1.UserId = ui.UserId
    JOIN UserAccounts ua2 on ua1.AccountID = ua2.AccountID
    JOIN UserConversion uc on uc.UserId = ua2.UserId