Search code examples
mysqlarraysmysql-json

Transform JSON field with MySQL


I have a JSON field in a tableA in which I save the moment when a row changes from one state to another like so,

row_id state_history
1 {"2021-09-14 21:00": "State #4", "2021-09-16 21:00": "State #1", "2021-09-17 21:00": "State #6"}
... ...

Is it possible to use this JSON in MySQL to generate a table in which I can measure the time it takes to change from one state to another? Like this:

row_id Initial_state Final_state Time_diff
1 State #4 State #1 2 days
1 State #1 State #6 1 day
2 State #5 State #2 1 day
2 State #2 State #1 4 days
2 State #1 State #6 1 day
... ... ... ...

Please notice that the number of states per row will be different. It doesn't really matter if the time difference measure is in minutes, hours or days.

For the state change part I have tried the following, however this way I can only get the first and the second state of each row. I have no idea how to make the time difference part.

SELECT A.row_id, A.state ->> '$[0]' AS Initial_state, A.state ->> '$[1]' AS Final_state
FROM 
(SELECT 
  row_id,
  state_history -> '$.*[0]' AS state
FROM 
  tableA) A

And if possible, group by pair of states (Initial_state, Final_state) so I can have a metric that averages the time it takes to change from a specific state to another.


Solution

  • WITH cte1 AS (
        SELECT test.row_id, 
               jsontable.`date`, 
               JSON_UNQUOTE(JSON_EXTRACT(test.state_history, CONCAT('$."', jsontable.`date`, '"'))) state
        FROM test
        CROSS JOIN JSON_TABLE(JSON_KEYS(state_history),
                              '$[*]' COLUMNS (`date` VARCHAR(64) PATH '$')) jsontable
    ),
    cte2 AS (
        SELECT row_id, 
               LAG(state) OVER (PARTITION BY row_id ORDER BY `date`) Initial_state,
               state Final_state,
               DATEDIFF(`date`, LAG(`date`) OVER (PARTITION BY row_id ORDER BY `date`)) Time_diff,
               `date`
        FROM cte1
    )
    SELECT row_id,
           CAST(Initial_state AS CHAR) Initial_state,
           CAST(Final_state AS CHAR) Final_state,
           Time_diff
    FROM cte2
    WHERE Initial_state IS NOT NULL
    ORDER BY row_id, `date`
    

    step-by-step fiddle