Search code examples
sqlsubquerygreatest-n-per-groupsnowflake-cloud-data-platformwindow-functions

Count number of records based on last updated date + null


Having trouble with what I thought would be pretty simple. Trying to get a count of items that don't have a title, our system tracks records by 'last_updated_date'.

id    work_item      title         last_updated_date  
1     task1          ProjectA      2020-03-25 20:20:01.111
2     task2          ProjectA      2020-03-25 20:20:01.111
3     task3                        2020-03-25 20:20:01.111
4     task4          ProjectB      2020-03-25 20:20:01.111
5     task5          ProjectC      2020-03-25 20:20:01.111

As you can see, I need to see what work items don't have a project. If I were to look at the history for 'task1' I would see something like following:

select work_item, last_updated_date, project
from table
where work_item = 'task1'

This would output:

work_item      last_updated_date          title
task1          2020-03-25 20:20:01.111    ProjectA
task1          2020-03-17 20:20:01.111    NULL
task1          2020-03-12 20:20:01.111    NULL

Based on this, I can see that task1 got a title assigned on 2020-03-25. What I need to know, is how many work items in the entire data set do not have a item assigned. So I want to check all records using the last updated date and check the latest last_updated_date to see if title is null. I tried the following, but I think I'm using MAX and/or group by wrong? I'm getting records back that have titles assigned, which makes me think it's checking the MAX value of the entire last_updated_date column, rather than each record within.

select id, title, MAX(last_updated_date) as "latest_timestamp"
FROM table
WHERE title is null
group by id, title

What I would like to see is only task3 show up as needing a title assigned.


Solution

  • One option uses a subquery to filter on the latest record per item. You can then count how many of them have no title:

    select count(*)
    from mytable t
    where 
        last_updated_date = (
            select max(t1.last_updated_date) 
            from mytable t1 
            where t1.work_item = t.work_item
        )
        and title is null
    

    You can also use window functions:

    select count(*)
    from (
        select t.*, 
            row_number() over(partition by work_item order by last_updated_date desc) rn
        from mytable t
    ) t
    where rn = 1 and title is null