I am really struggling with producing some code create 'level1 question' & 'level2 question' columns.
The logic to populate the columns is:
But I can not find an efficient way of creating 'level1 question' & 'level2 question' like in the example.
id | sort | type | question | level | answer | L1 | L2 | level1 question | level2 question |
---|---|---|---|---|---|---|---|---|---|
1 | 1 | Group | Location | 1 | NULL | Location | NULL | Location | NULL |
1 | 2 | Question | UK | 2 | Yes | NULL | NULL | Location | NULL |
1 | 3 | Question | EU | 2 | No | NULL | NULL | Location | NULL |
1 | 4 | Group | Asia | 2 | Yes | NULL | ASIA | Location | ASIA |
1 | 5 | Question | AUS | 3 | Yes | NULL | NULL | Location | ASIA |
1 | 6 | Question | NZ | 3 | No | NULL | NULL | Location | ASIA |
1 | 7 | Group | Season | 1 | NULL | Season | NULL | Season | NULL |
1 | 8 | Question | Summer | 2 | Yes | NULL | NULL | Season | NULL |
2 | 1 | Group | City | 1 | NULL | City | NULL | City | NULL |
2 | 2 | Question | London | 2 | Yes | NULL | NULL | City | NULL |
2 | 3 | Group | Street | 2 | NULL | NULL | Street | City | Street |
2 | 4 | Question | King ST | 3 | No | NULL | NULL | City | Street |
I have tried to use the following code but it is so slow. Just taken ~2 hours over 5 million rows.
'level1 question' =
(select top 1 L1
from table2 t2
where L1 is not null
and t1.Sort >= t2.Sort and t1.ID = t2.ID
order by id , sort DESC)
, 'level2 question' =
(select top 1 L2
from table2 t2
where L2 is not null AND (Level >2 OR Type = 'Group')
and t1.Sort >= t2.Sort and t1.ID = t2.ID
order by id , sort DESC)
The syntax of your code suggests that you are running SQL Server.
In the 2022 version, we can use last_value()
with ignore nulls
to efficiently retrieve the latest non-null of the conditional computation:
select t.*,
last_value(case when level = 1 and type = 'Group' then question end)
ignore nulls
over(partition by id order by sort) as level1_question,
last_value(case when level = 2 and type = 'Group' then question end)
ignore nulls
over(partition by id order by sort) as level2_question
from mytable t
In earlier versions (or in other databases that do not support ignore nulls
), we can still use window functions (which should be more efficient that multiple subqueries on a large dataset), but that requires more work. The idea is to define groups of rows that contain one non-null value followed by 0 to N null values:
select t.*,
max(question1) over(partition by id, grp1 order by sort) as level1_question,
max(question2) over(partition by id, grp2 order by sort) as level2_question
from (
select t.*,
count(question1) over(partition by id order by sort) grp1,
count(question2) over(partition by id order by sort) grp2
from mytable t
cross apply (
values (
case when level = 1 and type = 'Group' then question end,
case when level = 2 and type = 'Group' then question end
)
) as v(question1, question2)
) t