Search code examples
mysqlperformancegroupwise-maximum

MySQL Self Join Table Next/Prev Row Performance


I have a table that has a column Sequence that provides ordering for the table. I'm trying to get the next and previous rows joined together based off of Sequence so I can get the previous and next SKU values.

The table is defined as follows:

CREATE TABLE `Builder` (
    `Shipment Number` VARCHAR(50) NULL DEFAULT NULL,
    `SKU` VARCHAR(50) NULL DEFAULT NULL,
    `Pallet Number` VARCHAR(50) NULL DEFAULT NULL,
    `Sequence` INT NULL DEFAULT NULL,

    INDEX `Primary Index` (`Shipment Number`, `Pallet Number`, `Sequence`) USING BTREE
)

My query currently looks something like this for computing the next rows SKU value:

SELECT
    B1.`SKU`,
    B1.`Shipment Number`,
    B1.`Pallet Number`,
    B1.`Sequence`,
    B2.`SKU`
FROM Builder B1

LEFT JOIN Builder B2 ON
    B2.`Sequence.` = (
        SELECT MIN(B3.`Sequence.`)
        FROM Builder B3
        WHERE
            B3.`Sequence` > B1.`Sequence` AND
            B3.`Shipment Number` = B1.`Shipment Number` AND
            B3.`Pallet Number` = B1.`Pallet Number`
    ) AND
    B1.`Shipment Number` = B2.`Shipment Number` AND
    B1.`Pallet Number` = B2.`Pallet Number`

I have added an Index to the Builder table for (Sequence, Shipment Number, Pallet Number).

The query computes the next SKU correctly but the performance is pretty terrible, even running on a subset of my full dataset (50,000 rows) takes a few minutes. I'm not sure if there's anything else I can do to improve this queries performance.

Running on MySQL 8.0.20.

Thanks!


Solution

  • Use LEAD() and LAG() instead of joins. They were made for this.

    For example:

    select
      *,
      lag(`SKU`) over(partition by `Shipment Number`, `Pallet Number`
                      order by `Sequence`) as prev_sku,
      lead(`SKU`) over(partition by `Shipment Number`, `Pallet Number`
                       order by `Sequence`) as next_sku
    from builder
    

    LAG() returns the value of the previous row, according to the specified criteria (partition and ordering), while LEAD() returns the value of the next row according the specified criteria.

    You can also add an optional second parameter (an integer that defaults to 1) to indicate how far away (rows) you want to peek at.