Search code examples
sqlsql-servergreatest-n-per-group

Select rows in one table, adding column where MAX(Date) of rows in other, related table


I have a table containing a set of tasks to perform:

Task

    ID  Name

    1   Washing Up
    2   Hoovering
    3   Dusting

The user can add one or more Notes to a Note table. Each note is associated with a task:

Note

    ID  ID_Task     Completed(%)    Date

    11  1       25      05/07/2013 14:00
    12  1       50      05/07/2013 14:30
    13  1       75      05/07/2013 15:00
    14  3       20      05/07/2013 16:00
    15  3       60      05/07/2013 17:30

I want a query that will select the Task ID, Name and it's % complete, which should be zero if there aren't any notes for it. The query should return:

    ID  Name        Completed (%)

    1   Washing Up  75
    2   Hoovering    0
    3   Dusting     60

I've really been struggling with the query for this, which I've read is a "greatest n per group" type problem, of which there are many examples on SO, none of which I can apply to my case (or at least fully understand). My intuition was to start by finding the MAX(Date) for each task in the note table:

SELECT  ID_Task,
            MAX(Date) AS Date
            FROM
            Note
            GROUP BY
            ID_Task

Annoyingly, I can't just add "Complete %" to the above query unless it's contained in a GROUP clause. Argh! I'm not sure how to jump through this hoop in order to somehow get the task table rows with the column appended to it. Here is my pathetic attempt, which fails as it only returns tasks with notes and then duplicates task records at that (one for each note, so it's a complete fail).

SELECT  Task.ID,
    Task.Name,
    Note.Complete
    FROM
    Task        
    JOIN
    (SELECT ID_Task,
        MAX(Date) AS Date
        FROM
        Note
        GROUP BY
        ID_Task) AS InnerNote
    ON
    Task.ID = InnerNote.ID_Task
    JOIN
    Note
    ON
    Task.ID = Note.ID_Task

Can anyone help me please?


Solution

  • If we assume that tasks only become more complete, you can do this with a left outer join and aggregation:

    select t.ID, t.Name, coalesce(max(n.complete), 0)
    from tasks t left outer join
         notes n
         on t.id = n.id_task
    group by t.id, t.name
    

    If tasks can become "less complete" then you want the one with the last date. For this, you can use row_number():

    select t.ID, t.Name, coalesce(n.complete, 0)
    from tasks t left outer join
         (select n.*, row_number() over (partition by id_task order by date desc) as seqnum
          from notes n
         ) n
         on t.id = n.id_task and n.seqnum = 1;
    

    In this case, you don't need a group by, because the seqnum = 1 performs the same role.