Search code examples
sql-serversql-updateinner-joinone-to-many

UPDATE with a one-to-many JOIN (multiple columns)


I've run into an issue and I'm not sure if it is the intended behaviour. I have searched online for an answer, but all I could find was about what row would update last and so what would be the value after the query. I know there is no such thing as order in that case, and you can't be sure what the value would be ahead of time.

In my case though, I'm updating different columns, so overriding a previous update is not a concern of mine.

CREATE TABLE #original (id int, value1 int, value2 int)
INSERT INTO #original (id) VALUES (1), (2)

CREATE TABLE #temp (id int, name varchar(10), value int)
INSERT INTO #temp (id, name, value) VALUES (1, 'value1', 10), (1, 'value2', 11), (2, 'value1', 20), (2, 'value2', 21)

SELECT * FROM #original

id          value1      value2
----------- ----------- -----------
1           NULL        NULL
2           NULL        NULL

SELECT * FROM #temp

id          name       value
----------- ---------- -----------
1           value1     10
1           value2     11
2           value1     20
2           value2     21

UPDATE O SET 
    value1 = CASE WHEN T.name = 'value1' THEN T.value ELSE value1 END,
    value2 = CASE WHEN T.name = 'value2' THEN T.value ELSE value2 END
FROM
    #original O
        INNER JOIN #temp T ON T.id = O.id

SELECT * FROM #original

id          value1      value2
----------- ----------- -----------
1           10          NULL
2           20          NULL

I don't get why both value2 are NULL.

SELECT
    O.id,
    CASE WHEN T.name = 'value1' THEN T.value ELSE value1 END AS value1,
    CASE WHEN T.name = 'value2' THEN T.value ELSE value2 END AS value2
FROM
    #original O
        INNER JOIN #temp T ON T.id = O.id

id          value1      value2
----------- ----------- -----------
1           10          NULL
1           NULL        11
2           20          NULL
2           NULL        21

Running the above instead of the update, it looks exactly like what I thought it would and I would assume it means "four" updates, populating both value1 and value2 in both rows.

I would really appreciate if someone could explain this one to me.


Solution

  • The problem here is your assumption: "In my case though, I'm updating different columns, so overriding a previous update is not a concern of mine." It most certainly is a problem, because ultimately, your code

    value1 = CASE WHEN T.name = 'value1' THEN T.value ELSE value1 END,
    value2 = CASE WHEN T.name = 'value2' THEN T.value ELSE value2 END
    

    only runs (or takes effect) once per row. SQL Server is not required to update the same row twice, and will normally just arbitrarily take a single row's values, one of which is NULL.

    The documentation states (my bold):

    Use caution when specifying the FROM clause to provide the criteria for the update operation. The results of an UPDATE statement are undefined if the statement includes a FROM clause that is not specified in such a way that only one value is available for each column occurrence that is updated, that is if the UPDATE statement is not deterministic. For example, in the UPDATE statement in the following script, both rows in Table1 meet the qualifications of the FROM clause in the UPDATE statement; but it is undefined which row from Table1 is used to update the row in Table2.

    Only one row may be used and one update done, you cannot assume that the updates will happen sequentially. So you need to make sure you have a single match for each row you want to update.

    Therefore you should pre-aggregate your values

    UPDATE O SET 
        value1 = T.value1,
        value2 = T.value2
    FROM
        #original O
    INNER JOIN (
        SELECT
          id,
          MAX(CASE WHEN T.name = 'value1' THEN T.value END) value1,
          MAX(CASE WHEN T.name = 'value2' THEN T.value END) value2,
        FROM #temp T
        GROUP BY
          id
    ) T ON T.id = O.id;
    

    You could also use CROSS APPLY or a CTE for this.