Search code examples
sqlsqlitesubquerymaxgreatest-n-per-group

Select max for a tuple in table


I have a table that looks like this:

host, job, folder, file, mtime

Folder names are not unique and can be same for a job spread across different hosts. I need to pick folder where the max(mtime for a file) is the max across all the folders by the same name across different hosts. Roughly I need something like this:

Select (host, folder) pair where tuple (host, job, folder) max(max (file mtime))

Example:

1, j1, f1, e1, 2
2, j1, f1, e2, 0
2, j1, f1, e9, 3
3, j1, f1, e3, 2
1, j2, f2, e4, 3
2, j2, f2, e5, 4
3, j2, f2, e6, 5
1, j3, f3, e7, 6
2, j3, f3, e8, 7

result would be:

2, j1, f1, e9, 3
3, j2, f2, e6, 5
2, j3, f3, e8, 7

The table is huge, so I trying to find out best possible way to do this. Thanks


Solution

  • A window function like ROW_NUMBER() should provide the best performance:

    SELECT host, job, folder, file, mtime
    FROM (
      SELECT *, ROW_NUMBER() OVER (PARTITION BY folder, job ORDER BY mtime DESC) rn
      FROM tablename
    ) 
    WHERE rn = 1
    

    See the demo.
    Results:

    | host | job | folder | file | mtime |
    | ---- | --- | ------ | ---- | ----- |
    | 2    | j1  | f1     | e9   | 3     |
    | 3    | j2  | f2     | e6   | 5     |
    | 2    | j3  | f3     | e8   | 7     |