Search code examples
mysqllag

Simulating LAG in MySQL during an update


Yesterday an interesting question was asked which required updating a MySQL table using the LAG. Consider the following input table (left), and the desired output (right):

**INPUT**                                 **OUTPUT**
ID  TestDate    PerformanceStatus (PS)    ID  TestDate    PS  PreviousPerformanceStatus
1   15/03/2016  0                         1   15/03/2016  0   0
1   01/04/2016  2                         1   01/04/2016  2   0
1   05/05/2016  1                         1   05/05/2016  1   2
1   07/06/2016  1                         1   07/06/2016  1   1
2   15/03/2016  0                         2   15/03/2016  0   1
2   01/04/2016  2                         2   01/04/2016  2   0
2   05/05/2016  1                         2   05/05/2016  1   2
2   07/06/2016  3                         2   07/06/2016  3   1
2   23/08/2016  1                         2   23/08/2016  1   3

In other words, the goal is to assign to PreviousPerformanceStatus the value which existed in the record coming before, as ordered by ID then TestDate.

The accepted answer, given by @spencer7593, used a correlated subquery. However, what popped into my head first was to use a user variable. Here is how I answered:

SET @lag = 0;
UPDATE yourTable
SET PreviousPerformanceStatus = @lag,
    @lag:=PerformanceStatus
ORDER BY ID, TestDate

I was told that this answer is unstable, but I was wondering if someone could explain why something might go wrong, what would be happening in that case, and finally what could we do to use a user variable here to simulate LAG.

It is my understanding that the following SELECT query would have no issues at all:

SELECT PerformanceStatus,
       @lag AS PreviousPerformanceStatus,
       @lag:=PerformanceStatus
FROM yourTable
ORDER BY ID, TestDate

However, when doing an UPDATE there are other considerations to take into account.


Solution

  • I don't think you can set variables in an update statement. Here's my reasoning- Given this

    drop table if exists t;
    
    create table t (ID int, TestDate date,   PerformanceStatus int, previousperformancestatus int);
    insert into t values
    (1 ,  '2016-03-15' , 0, null),                         
    (1 ,  '2016-04-01' , 2, null),                         
    (1 ,  '2016-05-05' , 1, null),                         
    (1 ,  '2016-06-07' , 1, null),                         
    (2 ,  '2016-03-15' , 0, null),                         
    (2 ,  '2016-04-01' , 2, null),                         
    (2 ,  '2016-05-05' , 1, null),                         
    (2 ,  '2016-06-07' , 3, null),                         
    (2 ,  '2016-08-23' , 1, null)
    ;
    

    This code fails

    MariaDB [sandbox]> SET @lag = 0;
    Query OK, 0 rows affected (0.00 sec)
    
    MariaDB [sandbox]> UPDATE T
        -> SET previousPerformanceStatus = @lag ,
        ->     @lag:=PerformanceStatus
        -> ORDER BY ID, TestDate;
    ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near '@lag:=PerformanceStatus
    ORDER BY ID, TestDate' at line 3
    

    Commenting out @lag:=PerformanceStatus This code runs

    MariaDB [sandbox]> SET @lag = 0;
    Query OK, 0 rows affected (0.00 sec)
    
    MariaDB [sandbox]> UPDATE T
        -> SET previousPerformanceStatus = @lag
        -> #,@lag:=PerformanceStatus
        -> ORDER BY ID, TestDate;
    Query OK, 0 rows affected (0.01 sec)
    Rows matched: 9  Changed: 0  Warnings: 0
    

    Since code at least runs without error and the manual https://dev.mysql.com/doc/refman/5.7/en/update.html states "The SET clause indicates which columns to modify " my take on this is that you cannot set variables in an update statement so simulating lag is not possible using this method.