Search code examples
mysqldatetimewindow-functionspartition

Mysql last_value window function on a datetime column


Below is the data set I am working with:

customer_id,        event_date,             status,         credit_limit
1,                  2019-1-1,                   C,                  1000
1,                  2019-1-5,                   F,                  1000
1,                  2019-3-10,             [NULL],                  1000
1,                  2019-3-10,              [NULL],                 1000
1,                  2019-8-27,                  L,                  1000
2,                  2019-1-1,                   L,                  2000
2,                  2019-1-5,               [NULL],                 2500
2,                  2019-3-10,              [NULL],                 2500
3,                  2019-1-1,                   S,                  5000
3,                  2019-1-5,               [NULL],                 6000
3,                  2019-3-10,                  B,                  5000
4,                  2019-3-10,                  B,                  10000

I am trying to solve for the following:

For each customer_id, show account status at month end for the year 2019

I have tried using windows function last_value(), but it does not give me the latest date in a month. Here is my query:

with cte1 as 
(select customer_id, status, 
event_date,
last_value(date_format(event_date, '%Y-%m-%d')) over ( partition by customer_id, event_date
order by event_date) as l_v
from cust_acct ca 
where event_date between "2019-01-01 00:00:00" and "2019-12-31 11:59:59")
select * from cte1

It returns:

Customer_id,    Status,         Event_date,                 L_v
1,          C,          2019-01-01 00:00:00,            2019-01-01
1,          F,          2019-01-05 00:00:00,            2019-01-05
1,      [NULL],         2019-03-10 00:00:00,            2019-03-10
1,      [NULL],         2019-03-10 00:00:00,            2019-03-10
1,          L,          2019-08-27 00:00:00,            2019-08-27
2,          L,          2019-01-01 00:00:00,            2019-01-01
2,      [NULL],         2019-01-05 00:00:00,            2019-01-05
2,      [NULL],         2019-03-10 00:00:00,            2019-03-10
3,          S,          2019-01-01 00:00:00,            2019-01-01
3,      [NULL],         2019-01-05 00:00:00,            2019-01-05
3,          B,          2019-03-10 00:00:00,            2019-03-10
4,          B,          2019-03-10 00:00:00,            2019-03-10

Customer_id 1, for month 2019-01, should have a last_value of '2019-01-05' in the column l_v. Why is the query showing both dates in january in column l_v?


Solution

  • LAST_VALUE() is not the proper window function in this case.
    It can be used only if you extend the window:

    WITH cte1 AS (
      SELECT customer_id, status, event_date,
      LAST_VALUE(DATE(event_date)) OVER ( 
        PARTITION BY customer_id, DATE_FORMAT(event_date, '%Y-%m')
        ORDER BY event_date 
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
      ) AS l_v
      FROM cust_acct ca 
      WHERE event_date BETWEEN '2019-01-01 00:00:00' AND '2019-12-31 11:59:59'
    )
    SELECT * FROM cte1;
    

    You should use FIRST_VALUE():

    WITH cte1 AS (
      SELECT customer_id, status, event_date,
      FIRST_VALUE(DATE(event_date)) OVER ( 
        PARTITION BY customer_id, DATE_FORMAT(event_date, '%Y-%m')
        ORDER BY event_date DESC
      ) AS l_v
      FROM cust_acct ca 
      WHERE event_date BETWEEN '2019-01-01 00:00:00' AND '2019-12-31 11:59:59'
    )
    SELECT * FROM cte1;
    

    or better MAX():

    WITH cte1 AS (
      SELECT customer_id, status, event_date,
      MAX(DATE(event_date)) OVER ( 
        PARTITION BY customer_id, DATE_FORMAT(event_date, '%Y-%m')
      ) AS l_v
      FROM cust_acct ca 
      WHERE event_date BETWEEN '2019-01-01 00:00:00' AND '2019-12-31 11:59:59'
    )
    SELECT * FROM cte1;
    

    See the demo.