Search code examples
mysqlsqldatetimerecursive-querymetabase

MySQL sum values diagonally


I have a table which looks like this:

| created_at | Current_Value | M1  | M2  | M3  | M4  |
| ---------- | ------------- | --- | --- | --- | --- |
| 01/08/2020 | 840           | 840 | 838 | 838 | 838 |
| 01/09/2020 | 65            | 63  | 61  | 59  | 0   |
| 01/10/2020 | 109           | 104 | 99  | 0   | 0   |
| 01/11/2020 | 105           | 100 | 0   | 0   | 0   |
| 01/12/2020 | 61            | 0   | 0   | 0   | 0   |

The Current_Value columns stores count of items created in the created_at month, the columns M1 - M4 stores Current_Value value minus the decrease of that value in the following months relative to the created_at month, so for example columns M1 in the second row (date 01/09/2020) tells me that in M1 (next month, so 01/10/2020) the value was 63, so it decreased by 2 (65 - 63) and so on.

Now I would like to create an aggregated report in Metabase showing the overall decrease in the number of items per month - meaning ONE total row.

Now the problem is, that because I am creating this report in Metabase I can not use stored routines, nor can I use any other language, so I am stuck just with basic MySQL.

What I need to achieve is to diagonally add the values, so, for example, I need to do this:

NOTE: the phrase previous row is abbreviated as pr so it is more concise

| created_at | Current_Value | M1            | M2            | M3             | M4  |
| ---------- | ------------- | ------------- | ------------- | -------------- | --- |
| 01/08/2020 | 840           | 840           | 838           | 838            | 838 |
| 01/09/2020 | 65 + M1 in pr | 63 + M2 in pr | 61 + M3 in pr | 59 + M4 in pr  | 0   |
| 01/10/2020 | 109+ M1 in pr | 104+ M2 in pr | 99 + M3 in pr | 0              | 0   |
| 01/11/2020 | 105+ M1 in pr | 100+ M2 in pr | 0             | 0              | 0   |
| 01/12/2020 | 61 + M1 in pr | 0             | 0             | 0              | 0   |

Now under normal circumstances, I would just use session variables to achieve this, so I would do something like:

(@varM1 :=  0  + @varM1) AS M1_Prev, ( @varM1 := M1) M1_Now

to get the value from the previous row so I could add it to the current row, however, what I need to do here is not just take the value from the previous row but take already summed value from the previous row, so what I actually need to do would look like this:

NOTE: each cell is stretched to 3 rows, however, it still represents only one cell

| created_at | Current_Value | M1            | M2            | M3             | M4  |
| ---------- | ------------- | ------------- | ------------- | -------------- | --- |
| 01/08/2020 | 840           | 840           | 838           | 838            | 838 |
| ---------- | ------------- | ------------- | ------------- | -------------- | --- |
| 01/09/2020 | 65 + M1 in pr | 63 + M2 in pr | 61 + M3 in pr | 59 + M4 in pr  | 0   |
|            | = 65 + 840    | = 63 + 838    | = 61 + 838    | = 59 + 838     |     |
|            | = 905         | = 901         | = 899         | = 897          |     |
| ---------- | ------------- | ------------- | ------------- | -------------- | --- |
| 01/10/2020 | 109+ M1 in pr | 104+ M2 in pr | 99 + M3 in pr | 0              | 0   |
|            | = 109 + 901   | = 104 + 899   | = 99 + 897    |                |     |
|            | = 1010        | = 1003        | = 996         |                |     |
| ---------- | ------------- | ------------- | ------------- | -------------- | --- |
| 01/11/2020 | 105+ M1 in pr | 100+ M2 in pr | 0             | 0              | 0   |
|            | = 105 + 1003  | = 100 + 996   |               |                |     |
|            | = 1108        | = 1096        |               |                |     |
| ---------- | ------------- | ------------- | ------------- | -------------- | --- |
| 01/12/2020 | 61 + M1 in pr | 0             | 0             | 0              | 0   |
|            | = 61 + 1096   |               |               |                |     |
|            | = 1157        |               |               |                |     |
| ---------- | ------------- | ------------- | ------------- | -------------- | --- |

So I need to end up with the same table, but with values aggregated in this way, so I can work with the result.

I have been scratching my head over this problem for the past few hours rearranging and creating new variables to achieve this, but unfortunately, I had no success.

Is this even possible with plain MySQL? Is it is possible, could you please point me in the right direction, because I am unfortunately stuck.

I am using MySQL v5.7.

Thank you very much


Solution

  • The data model makes things more difficult than they should. Although this might possible with unpivot/pivot and window functions, I find that the logic is simpler to express with a recursive query:

    with  recursive
        data as (select t.*, row_number() over(order by created_at) rn from mytable t),
        cte as (
            select rn, created_at, current_value, m1, m2, m3, m4 from data where rn = 1
            union all
            select d.rn, d.created_at,
                d.current_value + c.m1, d.m1 + c.m2, d.m2 + c.m3, d.m3 + c.m4, d.m4
            from cte c
            inner join data d on d.rn = c.rn + 1
        )
    select * from cte
    

    This basically traverses the dataset from the earliest date to the latest, while performing the computation across rows as needed.

    Demo on DB Fiddle:

    rn | created_at | current_value |   m1 |  m2 |  m3 |  m4
    -: | :--------- | ------------: | ---: | --: | --: | --:
     1 | 2020-08-01 |           840 |  840 | 838 | 838 | 838
     2 | 2020-09-01 |           905 |  901 | 899 | 897 |   0
     3 | 2020-10-01 |          1010 | 1003 | 996 |   0 |   0
     4 | 2020-11-01 |          1108 | 1096 |   0 |   0 |   0
     5 | 2020-12-01 |          1157 |    0 |   0 |   0 |   0
    

    Note that window functions are available in MySQL 8.0 only.