I have a datatable (dt) like the following in SQL:
ID state_id act rd_1 rd_2 rd_3 rd_4 rd_5
11 abc,13.3 1 1. 31 17.4 32.4 0.4
11 afd,23.2 4 1. 42.1 1.3 31.9 0.39
11 dfw,154 7 0. 0.3 4.3 8.21 163.3
12 vab,64.5 8 1. 32.3 11 2.1 21.3
12 avf,542 2 0. 2.12 28.2 8.12 57.5
12 vjg,35 4 1. 5.7 8.64 7.46 0.25
13 vaw,424.2 4 1. 64.3 0.435 4.3 35.3
14 bda,243 1 0. 4.4 4.6 2.4 4.2
15 rbe,24.2 3 1. 43 53.5 4.4 8.5
I want to, for each row, calculate the variance of values from rd_1 to rd_5 (they are doubles). ID and state_id uniquely identifies a row. The desired output is the like the following:
ID state_id act rd_1 rd_2 rd_3 rd_4 rd_5. var_rd
11 abc,13.3 1 1. 31 17.4 32.4 0.4 192.6624
11 afd,23.2 4 1. 42.1 1.3 31.9 0.39 323.3181
11 dfw,154 7 0. 0.3 4.3 8.21 163.3 4109.9855
12 vab,64.5 8 1. 32.3 11 2.1 21.3 141.3463
13 vaw,424.2 4 1. 64.3 0.435 4.3 35.3 636.2333
14 bda,243 1 0. 4.4 4.6 2.4 4.2 3.0496
15 rbe,24.2 3 1. 43 53.5 4.4 8.5 473.2456
I know it is possible to use pivot to flatten the data and then calculate variance on column (rd_value) in the flattened data. But the SQL I use do not support Pivot method. I tried using UNION but it appears that it messes up with user_id.
I would approach this just by applying the formula for variance:
select t.*,
( (rd_1 - rd_avg) * (rd_1 - rd_avg) +
(rd_2 - rd_avg) * (rd_2 - rd_avg) +
(rd_3 - rd_avg) * (rd_3 - rd_avg) +
(rd_4 - rd_avg) * (rd_4 - rd_avg) +
(rd_5 - rd_avg) * (rd_5 - rd_avg)
)/4 as variance
from (select t.*,
(rd_1 + rd_2 + rd_3 + rd_4 + rd_5) / 5 as rd_avg -- This assumes you always have all 5 columns filled
from t
) t