Search code examples
mysqlt-sqlcoalesce

COALESCE function in SELECT Clause


Problem: is to find the last Non NULL value from the table based on id order, found here http://sqlmag.com/t-sql/last-non-null-puzzle

CREATE TABLE Score(`id` int, `col1` INT NULL);
INSERT INTO Score
    (`id`, `col1`)
VALUES
    (2, NULL),
    (3, 10),
    (5, -1),
    (7, NULL),
    (11, NULL),
    (13, -12),
    (17, NULL),
    (19, NULL),
    (23, 1759);

Desired Output:

id          col1        lastval
----------- ----------- -----------
2           NULL        NULL
3           10          10
5           -1          -1
7           NULL        -1
11          NULL        -1
13          -12         -12
17          NULL        -12
19          NULL        -12
23          1759        1759

My Code in SQLFiddle MySQL 5.6

SELECT s1.id, COALESCE(s2.col1)
FROM score s1 JOIN score s2
ON s1.id >= s2.id
GROUP BY s1.id
ORDER BY s1.id, s2.id DESC

COALESCE as I understand gives the first non NULL value encountered in a list. But It is giving (null) for all the columns. Can you point why COALESCE is not working? Also I realized removing the COALESCE should also answer the question as GROUP BY clause returns the first value in the s2.col1 column. I might be wrong here.


Solution

  • COALESCE() returns the first non-null value in its list of arguments. There is no implicit behavior of applying the COALESCE() to the same expression over the set of grouped rows. It is not an aggregating function like SUM() or GROUP_CONCAT().

    It makes no sense to use it with one argument as you're doing, because you've only given it one argument. So it's exactly the same as putting s2.col1 in your query's select-list without putting it inside a COALESCE() call.

    Then you're left with the default behavior of MySQL when referencing a non-grouped column in a GROUP BY function: MySQL chooses arbitrarily the value of s2.col1 from some row in the group. In practice, this is the first row in the group read in index order, but you shouldn't depend on this.

    So in your example, your join joins each row s1 to all rows s2 such that s2.id is earlier. Every group includes the first row in the table where s2.id=2. And that row has a NULL for col1.

    I read through the SQL puzzle you're trying to solve. It's going to be particularly awkward in MySQL 5.x, because MySQL doesn't support windowing functions until version 8.0.2.

    The solution in these cases usually involves using MySQL User-Defined Variables that change value as your query iterates over the rows in the table. This can be quite tricky.


    Here's how I would solve the problem in a portable way (with no MySQL user variables):

    SELECT s1.id, s1.col1, s2.col1 as lastval
    FROM Score AS s1
    LEFT OUTER JOIN Score AS s2 ON s2.id <= s1.id AND s2.col1 IS NOT NULL
    LEFT OUTER JOIN Score AS s3 ON s3.id > s2.id AND s3.id <= s1.id AND s3.col1 IS NOT NULL
    WHERE s3.id IS NULL
    ORDER BY s1.id;
    

    Here's how one could solve it using a MySQL user variable:

    SELECT id, col1, @lastval := COALESCE(col1, @lastval) AS lastval
    FROM (SELECT @lastval := NULL) AS _init 
    CROSS JOIN Score
    ORDER BY id;
    

    I would avoid the query you posted in your comment below, because it uses a correlated subquery. Correlated subqueries are usually bad for performance.