Search code examples
sqlpostgresqlsqlitecalculated-columnslag

How do I create a permanent (STORED) GENERATED (calculated/computed) column in SQLIte3 using the LAG function and IF-THEN-ELSE?


I have the following data (subset here):

![enter image description here

datetime,sys_id,cputil,memfree,sessnum
2019/05/03 08:06:14,100,0.57,0.51,47
2019/05/03 08:11:14,100,0.47,0.62,43
2019/05/03 08:16:14,100,0.56,0.57,62
2019/05/03 08:21:14,100,0.57,0.56,50
2019/05/03 08:26:14,100,0.35,0.46,43
2019/05/03 08:31:14,100,0.41,0.58,48
2019/05/03 08:36:14,100,0.57,0.35,58
2019/05/03 08:41:14,100,0.41,0.4,58
2019/05/03 08:46:14,100,0.53,0.35,62
2019/05/03 08:51:14,100,0.51,0.6,45
2019/05/03 08:56:14,100,0.32,0.37,47
2019/05/03 09:01:14,100,0.62,0.59,60
2019/05/03 09:06:14,100,0.66,0.72,57
2019/05/03 09:11:14,100,0.54,0.54,44
2019/05/03 09:16:14,100,0.29,0.4,47
2019/05/03 09:21:14,100,0.43,0.68,66
2019/05/03 09:26:14,100,0.49,0.66,65
2019/05/03 09:31:14,100,0.64,0.55,66
2019/05/03 09:36:14,100,0.42,0.6,42
2019/05/03 09:41:14,100,0.55,0.59,63

I need to create a permanent calculated column that tracks the difference for the cpu_util and memfree columns. That is, for each row, I need to know the amount it has changed from the previous row in the same column.

So the table should look like this: ![enter image description here

datetime,sys_id,cputil,memfree,sessnum,util_diff,mem_diff
2019/05/03 08:06:14,100,0.57,0.51,47,[NULL],[NULL]
2019/05/03 08:11:14,100,0.47,0.62,43,-0.1,0.11
2019/05/03 08:16:14,100,0.56,0.57,62,0.0900000000000001,-0.05
2019/05/03 08:21:14,100,0.57,0.56,50,0.0099999999999999,-0.0099999999999999
2019/05/03 08:26:14,100,0.35,0.46,43,-0.22,-0.1
2019/05/03 08:31:14,100,0.41,0.58,48,0.06,0.12
2019/05/03 08:36:14,100,0.57,0.35,58,0.16,-0.23
2019/05/03 08:41:14,100,0.41,0.4,58,-0.16,0.05
2019/05/03 08:46:14,100,0.53,0.35,62,0.12,-0.05
2019/05/03 08:51:14,100,0.51,0.6,45,-0.02,0.25
2019/05/03 08:56:14,100,0.32,0.37,47,-0.19,-0.23
2019/05/03 09:01:14,100,0.62,0.59,60,0.3,0.22
2019/05/03 09:06:14,100,0.66,0.72,57,0.04,0.13
2019/05/03 09:11:14,100,0.54,0.54,44,-0.12,-0.18
2019/05/03 09:16:14,100,0.29,0.4,47,-0.25,-0.14
2019/05/03 09:21:14,100,0.43,0.68,66,0.14,0.28
2019/05/03 09:26:14,100,0.49,0.66,65,0.06,-0.02
2019/05/03 09:31:14,100,0.64,0.55,66,0.15,-0.11
2019/05/03 09:36:14,100,0.42,0.6,42,-0.22,0.0499999999999999
2019/05/03 09:41:14,100,0.55,0.59,63,0.13,-0.01

I have found various tutorials on using the LAG function, but I cannot get it to work when trying to use LAG in a GENERATED column context rather than a SELECT statement context.

This needs to be automatically calculated and updated every time a new row is added.

Trying this also in PostgreSQL, I have the following code for creating the original table without the necessary diff columns:

CREATE TABLE myTable (
    datetime TEXT,
    sys_id INT,
    cputil REAL,
    memfree REAL,
    sessnum INT
)

This is fine.

Now, what I need the table to have is extra columns that are automatically calculated (GENERATED) but when I try the following, it fails:

CREATE TABLE myTable (
    datetime TEXT,
    sys_id INT,
    cputil REAL,
    memfree REAL,
    sessnum INT,
    util_diff REAL GENERATED ALWAYS AS (cputil - LAG(cputil)) OVER (ORDER BY datetime) STORED,
    mem_diff REAL GENERATED ALWAYS AS (memfree - LAG(memfree)) OVER (ORDER BY datetime) STORED
)

NOTE: assume datetime format is YYYY/MM/DD for the purposes of ORDER BY (row sorting).

I know that I probably need to create additional 'intermediate' columns that hold the LAG output, like this: ![enter image description here

datetime,sys_id,cputil,memfree,sessnum,util_lag,mem_lag,util_diff,mem_diff
2019/05/03 08:06:14,100,0.57,0.51,47,[NULL],[NULL],[NULL],[NULL]
2019/05/03 08:11:14,100,0.47,0.62,43,0.57,0.51,-0.1,0.11
2019/05/03 08:16:14,100,0.56,0.57,62,0.47,0.62,0.0900000000000001,-0.05
2019/05/03 08:21:14,100,0.57,0.56,50,0.56,0.57,0.0099999999999999,-0.0099999999999999
2019/05/03 08:26:14,100,0.35,0.46,43,0.57,0.56,-0.22,-0.1
2019/05/03 08:31:14,100,0.41,0.58,48,0.35,0.46,0.06,0.12
2019/05/03 08:36:14,100,0.57,0.35,58,0.41,0.58,0.16,-0.23
2019/05/03 08:41:14,100,0.41,0.4,58,0.57,0.35,-0.16,0.05
2019/05/03 08:46:14,100,0.53,0.35,62,0.41,0.4,0.12,-0.05
2019/05/03 08:51:14,100,0.51,0.6,45,0.53,0.35,-0.02,0.25
2019/05/03 08:56:14,100,0.32,0.37,47,0.51,0.6,-0.19,-0.23
2019/05/03 09:01:14,100,0.62,0.59,60,0.32,0.37,0.3,0.22
2019/05/03 09:06:14,100,0.66,0.72,57,0.62,0.59,0.04,0.13
2019/05/03 09:11:14,100,0.54,0.54,44,0.66,0.72,-0.12,-0.18
2019/05/03 09:16:14,100,0.29,0.4,47,0.54,0.54,-0.25,-0.14
2019/05/03 09:21:14,100,0.43,0.68,66,0.29,0.4,0.14,0.28
2019/05/03 09:26:14,100,0.49,0.66,65,0.43,0.68,0.06,-0.02
2019/05/03 09:31:14,100,0.64,0.55,66,0.49,0.66,0.15,-0.11
2019/05/03 09:36:14,100,0.42,0.6,42,0.64,0.55,-0.22,0.0499999999999999
2019/05/03 09:41:14,100,0.55,0.59,63,0.42,0.6,0.13,-0.01

in which case the SQL should look something like this:

CREATE TABLE myTable (
    datetime TEXT,
    sys_id INT,
    cputil REAL,
    memfree REAL,
    sessnum INT,
    util_lag REAL GENERATED ALWAYS AS LAG(cputil) OVER (ORDER BY datetime) STORED,
    mem_lag REAL GENERATED ALWAYS AS LAG(memfree) OVER (ORDER BY datetime) STORED,
    util_diff REAL GENERATED ALWAYS AS (cputil - util_lag) STORED,
    mem_diff REAL GENERATED ALWAYS AS (memfree - mem_lag) STORED
)

The last calculated column I need to add to the table is a text string that indicates whether the change (difference from previous row) was "Up" or "Down", so the final table should look like this: ![enter image description here

datetime,sys_id,cputil,memfree,sessnum,util_lag,mem_lag,util_diff,mem_diff,change
2019/05/03 08:06:14,100,0.57,0.51,47,[NULL],[NULL],[NULL],[NULL],[NULL]
2019/05/03 08:11:14,100,0.47,0.62,43,0.57,0.51,-0.1,0.11,Down
2019/05/03 08:16:14,100,0.56,0.57,62,0.47,0.62,0.0900000000000001,-0.05,Up
2019/05/03 08:21:14,100,0.57,0.56,50,0.56,0.57,0.0099999999999999,-0.0099999999999999,Up
2019/05/03 08:26:14,100,0.35,0.46,43,0.57,0.56,-0.22,-0.1,Down
2019/05/03 08:31:14,100,0.41,0.58,48,0.35,0.46,0.06,0.12,Up
2019/05/03 08:36:14,100,0.57,0.35,58,0.41,0.58,0.16,-0.23,Up
2019/05/03 08:41:14,100,0.41,0.4,58,0.57,0.35,-0.16,0.05,Down
2019/05/03 08:46:14,100,0.53,0.35,62,0.41,0.4,0.12,-0.05,Up
2019/05/03 08:51:14,100,0.51,0.6,45,0.53,0.35,-0.02,0.25,Down
2019/05/03 08:56:14,100,0.32,0.37,47,0.51,0.6,-0.19,-0.23,Down
2019/05/03 09:01:14,100,0.62,0.59,60,0.32,0.37,0.3,0.22,Up
2019/05/03 09:06:14,100,0.66,0.72,57,0.62,0.59,0.04,0.13,Up
2019/05/03 09:11:14,100,0.54,0.54,44,0.66,0.72,-0.12,-0.18,Down
2019/05/03 09:16:14,100,0.29,0.4,47,0.54,0.54,-0.25,-0.14,Down
2019/05/03 09:21:14,100,0.43,0.68,66,0.29,0.4,0.14,0.28,Up
2019/05/03 09:26:14,100,0.49,0.66,65,0.43,0.68,0.06,-0.02,Up
2019/05/03 09:31:14,100,0.64,0.55,66,0.49,0.66,0.15,-0.11,Up
2019/05/03 09:36:14,100,0.42,0.6,42,0.64,0.55,-0.22,0.0499999999999999,Down
2019/05/03 09:41:14,100,0.55,0.59,63,0.42,0.6,0.13,-0.01,Up

So I am trying the following code:

CREATE TABLE myTable (
    datetime TEXT,
    sys_id INT,
    cputil REAL,
    memfree REAL,
    sessnum INT,
    util_lag REAL GENERATED ALWAYS AS LAG(cputil) OVER (ORDER BY datetime) STORED,
    mem_lag REAL GENERATED ALWAYS AS LAG(memfree) OVER (ORDER BY datetime) STORED,
    util_diff REAL GENERATED ALWAYS AS (cputil - util_lag) STORED,
    mem_diff REAL GENERATED ALWAYS AS (memfree - mem_lag) STORED,
    change TEXT GENERATED ALWAYS AS CAST (
        CASE
            WHEN util_diff > 0 THEN 'Down'
            WHEN util_diff < 0 THEN 'Up'
    ) END AS STORED
)

But again, this fails with an error.

Finally, I need to know the correct syntax to do this in SQLite3 as well as PostgreSQL.


Solution

  • This is not possible, you can't use a generated column for this. From the manual: The generation expression can only use immutable functions and cannot use subqueries or reference anything other than the current row in any way.

    You can create a trigger + trigger function for this.