Search code examples
mysqlsqlwindow-functionsmysql-5.6

Simulate LAG with GROUP BY in MySQL 5.6


How do I simulate LAG function from MySQL 8.0 in MySQL 5.6 where I get the previous info with same ItemID. I've created this illustration to simulate the table and the output of the query I need. The gray data is the original table and the orange is the data from the previous row with same ItemID and the closest previous date.

enter image description here

I've tried joining the table with it self on A.ItemID = B.ItemID AND B.Date < A.Date. I've also tried group by ItemID and tried getting the max date below the current date.

I've tried this aswell.

SET @lag = -1;
SELECT *, @lag PreviousInfo, @lag:=Info CurrentInfo FROM Table1

However this always just returns the info from the previous row and is not grouped by the ItemID.


Solution

  • You could use two correlated subqueries:

    select
        t.*,
        (
            select date 
            from mytable t1 
            where t1.itemid = t.itemid and t1.date < t.date
            order by t.date desc
            limit 1
        ) previous_date,
        (
            select info 
            from mytable t1 
            where t1.itemid = t.itemid and t1.date < t.date
            order by t.date desc
            limit 1
        ) previous_info
    from mytable t
    

    However this does not scale well when you need to recover more columns from the previous record. In that case, we can self-join with a not exists condition to filter on the previous record:

    select
        t.*,
        tlag.date previous_date,
        tlag.info previous_info
    from mytable t
    left join mytable tlag
        on  tlag.itemid = t.itemid
        and tlag.date   < t.date
        and not exists (
            select 1
            from mytable t1
            where t1.itemid = t.itemid and t1.date < t.date and t1.date > tlag.date
        )
    

    For performance of both queries, consider the following an index on (item_id, date). You might want to add info to the index, like: (item_id, date, info), especially with the first query, so both subqueries are covered by the index.