Search code examples
mysqlmysql-json

MySQL update run SET for every row from subquery


I have two tables: Files and FileLogs and I'm trying to get rid of the FileLogs table and store some data from it inside the Files table.

Files

id lastStatusId meta
187 101 NULL
188 101 { "foo": "bar" }

FileLogs

id fileId statusId createdAt meta
1 187 101 2019-06-18 19:43:19 NULL
2 187 102 2019-06-18 19:43:22 NULL
3 187 202 2019-06-18 19:45:32 { "foo2": "bar2" }

I already added the lastStatusId column to the Files table (defaults to 101).

To get the latest statusId for every file from the logs into the latestStatusId I run this query and it works great:

UPDATE
  `Files` f1,
  (
    SELECT
      f.id as fileId,
      fl.statusId,
      fl.meta
    FROM
      (
        SELECT
          MAX(id) as id,
          fileId
        FROM
          `FileLogs`
        GROUP BY
          fileId
      ) AS latestIds
      LEFT JOIN `FileLogs` AS fl ON latestIds.id = fl.id
      LEFT JOIN `Files` AS f ON f.id = fl.fileId
    ORDER BY
      fl.id ASC
  ) temp1
SET
  f1.lastStatusId = temp1.statusId
WHERE
  f1.id = temp1.fileId;

I thought I could use a similar query to go over all FileLogs and write the status into the Files' meta field like so:

UPDATE
  `Files` f1,
  (
    SELECT
      f.id as fileId,
      fl.statusId,
      fl.meta,
      fl.createdAt
    FROM
      `FileLogs` AS fl
      LEFT JOIN `Files` AS f ON f.id = fl.fileId
  ) temp1
SET
  f1.meta = JSON_MERGE_PATCH(
    COALESCE(f1.meta, '{}'),
    JSON_OBJECT(
      'statusInfo',
      JSON_OBJECT(
        CONCAT('', temp1.`statusId`),
        JSON_OBJECT(
          'createdAt',
          temp1.createdAt
        )
      )
    )
  )
WHERE
  f1.id = temp1.fileId;

Result of the subquery is something like this:

fileId  statusId    meta            createdAt
187     101         NULL            2019-06-18 19:43:19
187     102         NULL            2019-06-18 19:43:22
187     202         {"foo": "bar"}  2019-06-18 19:43:26

But the SET is only run for the first f1.id = temp1.fileId and not the second or third (which kind of makes sense) - and the result looks like this:

File id:187

meta

{
    "statusInfo": {
        "101": {
            "createdAt": "2019-06-18 19:43:19.000000"
        }
    }
}

  1. How can I achieve that the two other status (102, 202) are also added to the meta column?
  2. How can I also get the FileLogs' meta data in there as well (if not null)?

Solution

  • You can pre-aggregate the JSON in the subquery. Here's a demo:

    select f.id,
      json_objectagg(l.statusId,
        json_object(
          'createdAt', l.createdAt, 
          'meta', l.meta
      )) as statusInfo
    from Files as f
    join FileLogs as l on f.id = l.fileId
    group by f.id;
    

    Result given your example data:

            id: 187
    statusInfo: {
      "101": {
        "meta": null,
        "createdAt": "2019-06-18 19:43:19.000000"
      },
      "102": {
        "meta": null,
        "createdAt": "2019-06-18 19:43:22.000000"
      },
      "202": {
        "meta": {
          "foo2": "bar2"
        },
        "createdAt": "2019-06-18 19:45:32.000000"
      }
    }
    

    Now that this resembles the JSON structure you want, we can wrap this into an UPDATE:

    update Files
    join (
      select fileId,
        json_objectagg(statusId,
          json_object(
            'createdAt', createdAt,
            'meta', meta
          )
        ) as statusInfo
      from FileLogs
      group by fileId
    ) as f on Files.id = f.fileId
    set meta = json_object('statusInfo', f.statusInfo);
    

    However, you should think about the implications of doing this.

    • Storing data in JSON takes more space than storing the same data in rows of FileLogs.

    • Adding subsequent information to the File.meta JSON document can be done with JSON merging, but that will certainly take more careful thought than simply INSERT a new row to FileLogs.

    • How will you need to query the statusInfo in the future? If you need to search for entries with certain status, or which one is the most recent, or which one had a status entry on a given date, this could be tricky to search the JSON document, whereas it is much easier if you keep the status info in rows of FileLogs. You should make sure you know how to do all the queries you need to do before you commit to using JSON.