sql mysql window-functions gaps-and-islands

Fill in Nulls with Row above

I am really struggling with producing some code create 'level1 question' & 'level2 question' columns.

The logic to populate the columns is:

'L1' when level = 1 & type = group then question
'L2' when level = 2 & type = group then question

But I can not find an efficient way of creating 'level1 question' & 'level2 question' like in the example.

id	sort	type	question	level	answer	L1	L2	level1 question	level2 question
1	1	Group	Location	1	NULL	Location	NULL	Location	NULL
1	2	Question	UK	2	Yes	NULL	NULL	Location	NULL
1	3	Question	EU	2	No	NULL	NULL	Location	NULL
1	4	Group	Asia	2	Yes	NULL	ASIA	Location	ASIA
1	5	Question	AUS	3	Yes	NULL	NULL	Location	ASIA
1	6	Question	NZ	3	No	NULL	NULL	Location	ASIA
1	7	Group	Season	1	NULL	Season	NULL	Season	NULL
1	8	Question	Summer	2	Yes	NULL	NULL	Season	NULL
2	1	Group	City	1	NULL	City	NULL	City	NULL
2	2	Question	London	2	Yes	NULL	NULL	City	NULL
2	3	Group	Street	2	NULL	NULL	Street	City	Street
2	4	Question	King ST	3	No	NULL	NULL	City	Street

I have tried to use the following code but it is so slow. Just taken ~2 hours over 5 million rows.

'level1 question' = 
                (select top 1 L1 
                from table2 t2
                where L1 is not null
                and t1.Sort >= t2.Sort and t1.ID = t2.ID
                order by id , sort DESC)
, 'level2 question' = 
            (select top 1 L2 
            from table2 t2
            where L2 is not null AND (Level >2 OR Type = 'Group')
            and t1.Sort >= t2.Sort and t1.ID = t2.ID
            order by id , sort DESC)

Solution

The syntax of your code suggests that you are running SQL Server.

In the 2022 version, we can use last_value() with ignore nulls to efficiently retrieve the latest non-null of the conditional computation:

select t.*,
    last_value(case when level = 1 and type = 'Group' then question end) 
        ignore nulls
        over(partition by id order by sort) as level1_question,
    last_value(case when level = 2 and type = 'Group' then question end) 
        ignore nulls
        over(partition by id order by sort) as level2_question
from mytable t

In earlier versions (or in other databases that do not support ignore nulls), we can still use window functions (which should be more efficient that multiple subqueries on a large dataset), but that requires more work. The idea is to define groups of rows that contain one non-null value followed by 0 to N null values:

select t.*,
    max(question1) over(partition by id, grp1 order by sort) as level1_question,
    max(question2) over(partition by id, grp2 order by sort) as level2_question
from (
    select t.*,
        count(question1) over(partition by id order by sort) grp1,
        count(question2) over(partition by id order by sort) grp2
    from mytable t
    cross apply (
        values (
            case when level = 1 and type = 'Group' then question end,
            case when level = 2 and type = 'Group' then question end
        ) 
    ) as v(question1, question2)
) t