Search code examples
sqlmysqlwindow-functionsgaps-and-islands

Fill in Nulls with Row above


I am really struggling with producing some code create 'level1 question' & 'level2 question' columns.

The logic to populate the columns is:

  • 'L1' when level = 1 & type = group then question
  • 'L2' when level = 2 & type = group then question

But I can not find an efficient way of creating 'level1 question' & 'level2 question' like in the example.

id sort type question level answer L1 L2 level1 question level2 question
1 1 Group Location 1 NULL Location NULL Location NULL
1 2 Question UK 2 Yes NULL NULL Location NULL
1 3 Question EU 2 No NULL NULL Location NULL
1 4 Group Asia 2 Yes NULL ASIA Location ASIA
1 5 Question AUS 3 Yes NULL NULL Location ASIA
1 6 Question NZ 3 No NULL NULL Location ASIA
1 7 Group Season 1 NULL Season NULL Season NULL
1 8 Question Summer 2 Yes NULL NULL Season NULL
2 1 Group City 1 NULL City NULL City NULL
2 2 Question London 2 Yes NULL NULL City NULL
2 3 Group Street 2 NULL NULL Street City Street
2 4 Question King ST 3 No NULL NULL City Street

I have tried to use the following code but it is so slow. Just taken ~2 hours over 5 million rows.

'level1 question' = 
                (select top 1 L1 
                from table2 t2
                where L1 is not null
                and t1.Sort >= t2.Sort and t1.ID = t2.ID
                order by id , sort DESC)
, 'level2 question' = 
            (select top 1 L2 
            from table2 t2
            where L2 is not null AND (Level >2 OR Type = 'Group')
            and t1.Sort >= t2.Sort and t1.ID = t2.ID
            order by id , sort DESC)

Solution

  • The syntax of your code suggests that you are running SQL Server.

    In the 2022 version, we can use last_value() with ignore nulls to efficiently retrieve the latest non-null of the conditional computation:

    select t.*,
        last_value(case when level = 1 and type = 'Group' then question end) 
            ignore nulls
            over(partition by id order by sort) as level1_question,
        last_value(case when level = 2 and type = 'Group' then question end) 
            ignore nulls
            over(partition by id order by sort) as level2_question
    from mytable t
    

    In earlier versions (or in other databases that do not support ignore nulls), we can still use window functions (which should be more efficient that multiple subqueries on a large dataset), but that requires more work. The idea is to define groups of rows that contain one non-null value followed by 0 to N null values:

    select t.*,
        max(question1) over(partition by id, grp1 order by sort) as level1_question,
        max(question2) over(partition by id, grp2 order by sort) as level2_question
    from (
        select t.*,
            count(question1) over(partition by id order by sort) grp1,
            count(question2) over(partition by id order by sort) grp2
        from mytable t
        cross apply (
            values (
                case when level = 1 and type = 'Group' then question end,
                case when level = 2 and type = 'Group' then question end
            ) 
        ) as v(question1, question2)
    ) t