Search code examples
mysqlgreatest-n-per-groupmysql-5.7

How to do the equivalent of a lateral join in MySQL 5.7?


I'm trying to get the informations from two tables, one table that contains general data about an object A, and B holding the history of supplementary data for this object.

What i want to do is get all the records from A And the corresponding lastest instance of B (highest id) with a fast request, the two tables being quite big. B has a key that allows it to identify which A it is linked to since it is a one to many relation. I also want to get infos if there are some from a third table C, i can get those info from two fields that are present in B. There also can be no C data for a given B, i want to know that too..

From what i seached on the forums, lateral join seems to be the solution, but it does not exist in my version of MySQL.

I also want to be able to filter query with variable conditions in A and B column values.

What i'm trying right now (functionnal if i can wait hours) is:

SELECT a.*, b.*, c.* 
FROM a
    INNER JOIN (SELECT b.* FROM b ORDER BY b.id DESC) b ON b.a_id = a.id
    LEFT JOIN c ON c.val1 = b.val1 AND c.val2 = b.val2
GROUP BY b.id

Solution

  • What your describing is something that window functions like ROW_NUMBER() help with but those too are MySQL 8+. With MySQL 5.7 things are a little trickier but there are a couple of ways to do it.

    SET @rank := 0;
    SET @aid := 0;
    
    SELECT
      a.*, b.*, c.*
    FROM
      a
    INNER JOIN (
        SELECT
          @rank := CASE WHEN b.a_id = @aid THEN @rank + 1 ELSE 1 END AS RankedRow
          ,b.*
          ,@aid := b.a_id
        FROM
          b
        ORDER BY
          b.a_id
          ,b.id DESC
    ) AS b
      ON a.id = b.a_id
      AND b.RankedRow = 1
    LEFT JOIN c
      ON c.val1 = b.val1
      AND c.val2 = b.val2
    

    This uses a variable to hold the current row number and uses a trick to update that value as well as return it. If a_id is the same, it's incremented by 1 but when a_id changes, it is reset to 1.

    Another option is to join b to itself

    SELECT
      a.*, b.*, c.*
    FROM
      a
    INNER JOIN (
      SELECT
        b.*
      FROM
        b
      LEFT JOIN b AS newerb
        ON newerb.a_id = b.a_id
        AND newerb.id > b.id
      WHERE
        newerb.id IS NULL
    ) AS b
      ON a.id = b.a_id
    LEFT JOIN c
      ON c.val1 = b.val1
      AND c.val2 = b.val2
    

    Here newerb will be null if there is no id greater than b for the same a_id and therefore must be the latest row.

    I've used both solutions in the past, I tend to prefer the latter as it's more readable but it may be slower depending on how much data you have.

    I haven't time to test both examples so apologies in advance for any typos I've missed.