Search code examples
mysql

Setting value in a table based on another table


There are 4 tables: trunk, branch, leaf and data. The Trunk table has an IDTrunk and TrunkNumber columns. The Branch table has an IDBranch, IDTrunk and BranchNumber columns. The leaf table has a IDLeaf, IDBranch and a LeafNumber columns. The Data table has an IDData, a TrunkNumber, a BranchNumber, a LeafNumber columns and a LastEditTime column. For each table, the IDx columns are the primary keys and the xNumber columns are numbers that are shown to the user. Please note that the data table holds the user shown number, not the TBL-ID (trunk, branch, leaf ID) primary keys. As the name implies, the trunk, branch and leaf tables have a parent-child relationship and an element can have multiple children but only a single parent. The child knows its parent, through the IDx column, but the parent doesn't know its children directly. Data is only associated to a TBL-ID combination. Effectively, it means that only a leaf can have data. Data can also be associated with no leaf and the xNumber columns would contain null. Any given TBL-ID combination can only appear once.

It was noticed that when the Data table grows past 200k entries, recovering the latest Data entry for any given TBL-ID takes too much time. It was decided to add a LatestDataID column to the leaf table to speed this up. This new column would countain the IDData of the latest data entry for the associated leaf. If there is no data associated with a leaf, then it can hold -1, null or some other value that would make it obvious that it doesn't have data. It is currently set to -1 but can be changed if that causes problem. Adding the new column and updating it when new data is pushed is simple enough. The issue is that it is not known how to update the LatestDataID column when the database structure is updated. Ideally, it would have to be done with a single query. This query will be executed only once when the database structure is updated to contain this new column.

As example :

Trunk Table

IDTrunk TrunkNumber
1 10
2 5

Branch Table

IDBranch IDTrunk BranchNumber
1 1 1
2 1 2

Leaf Table

IDLeaf IDBranch LeafNumber LatestDataID (new column to update)
1 1 5 1
2 1 6 4
3 1 7 -1
4 1 10 5
5 2 5 7
6 2 6 -1

Data Table

IDData TrunkNumber BranchNumber LeafNumber LastEditTime
1 10 1 5 9h50
2 10 1 5 8h50
3 10 1 6 7h00
4 10 1 6 7h30
5 10 1 10 12h00
6 null null null 10h00
7 10 2 5 10h00

Using this query :

SELECT trunk.TrunkNumber, branch.BranchNumber, leaf.LeafNumber  
    FROM leaf  
    INNER JOIN branch ON leaf.IDBranch = branch.IDBranch  
    INNER JOIN leaf ON branch.IDTrunk = trunk.IDTrunk;

The result is

TrunkNumber BranchNumber LeafNumber
10 1 5
10 1 6
10 1 7
10 1 10
10 2 5
10 2 6

What should be done to update Leaf.LatestDataID properly?

--EDIT--

LastEditTime is a proper timestamp. It was written like that in the example for the sake of brevity.

--EDIT 2--

MySQL 5.7 is used. If it is simpler, a procedure could also be used.


Solution

  • This assumes MySQL 8+. If you are still using 5.7 or earlier, it is way past time to upgrade.

    This will provide the desired update based on what I understand of your question:

    UPDATE leaf
    LEFT JOIN (
        SELECT l.IDLeaf, d.IDData,
            ROW_NUMBER() OVER (PARTITION BY l.IDLeaf ORDER BY d.LastEditTime DESC) AS rn
        FROM leaf l
        JOIN branch b
            ON l.IDBranch = b.IDBranch
        JOIN trunk t
            ON b.IDTrunk = t.IDTrunk
        JOIN data d
            ON l.LeafNumber = d.LeafNumber
            AND b.BranchNumber = d.BranchNumber
            AND t.TrunkNumber = d.TrunkNumber
    ) latest
        ON leaf.IDLeaf = latest.IDLeaf AND latest.rn = 1
    SET leaf.LatestDataID = COALESCE(latest.IDData, -1);
    

    Here's a db<>fiddle.

    We do not know how your application is interacting with the db, but it seems likely that you would be much better off fixing your data model, as opposed to introducing redundant data while the dataset is so small.

    Replacing TrunkNumber, BranchNumber, LeafNumber in Data with just IDLeaf would seem like a much better option.


    For MySQL < 8 you can achieve the same using user defined variables for row number and tracking previous IDLeaf:

    UPDATE leaf
    LEFT JOIN (
    SELECT IDLeaf, IDData, @rn := IF(IDLeaf = @prevIDLeaf, @rn + 1, 1) AS rn, @prevIDLeaf := IDLeaf
        FROM (
            SELECT l.IDLeaf, d.IDData, d.LastEditTime
            FROM leaf l
            JOIN branch b ON l.IDBranch = b.IDBranch
            JOIN trunk t ON b.IDTrunk = t.IDTrunk
            JOIN data d ON l.LeafNumber = d.LeafNumber AND b.BranchNumber = d.BranchNumber AND t.TrunkNumber = d.TrunkNumber
            ORDER BY l.IDLeaf ASC, d.LastEditTime DESC  
        ) AS t
        JOIN (SELECT @prevIDLeaf := null, @rn := 0) init
    ) AS latest ON leaf.IDLeaf = latest.IDLeaf AND latest.rn = 1
    SET leaf.LatestDataID = COALESCE(latest.IDData, -1);
    

    Here's another db<>fiddle.