Search code examples
sqlsql-servercommon-table-expression

How to effective get the max data in sub group and get the min data in among big groups?


Firstly, I hope to get the max date in each sub-groups.

Group A = action 1 & 2

Group B = action 3 & 4

actionName action actionBy actiontime
999 1 Tom 2022-07-15 09:18:00
999 1 Tom 2022-07-15 15:21:00
999 2 Peter 2022-07-15 14:06:00
999 2 Peter 2022-07-15 14:08:00
999 3 Sally 2022-07-15 14:20:00
999 3 Mary 2022-07-15 14:22:00
999 4 Mary 2022-07-15 14:25:00

In this example:

The max time of group A is "1 | Tom | 2022-07-15 15:21:00 "

The max time of group B is " 4 | Mary | 2022-07-15 14:25:00 "

The final answer is "1 | Tom | 2022-07-15 14:25:00 ", which is the minimum data among groups.

I have a method how to get the max date in each group like the following code.

with cte1
as (select actionName,
           actiontime,
           actionBy,
           row_number() over (partition by actionName order by actiontime desc) as rn
    from actionDetails
    where action in ( '1', '2' )
    UNION
    select actionName,
           actiontime,
           actionBy,
           row_number() over (partition by actionName order by actiontime desc) as rn
    from actionDetails
    where action in ( '3', '4' )
   )
select *
from cte1
where rn = 1

ActionName is not PK. It would get the max data in each group.

Then, I don't know how to use an effective way to get the minimum data between group A and group B. Would you give me some ideas?

I know one of the methods is self join again. However, I think that is not the best solution.


Solution

  • First of all, you can simplify your query by putting the action groups into the partition clause. Use a case expression to get one group for actions 1 and 2 and another for actions 3 and 4.

    Then after getting the maximum dates per actionname and action group you want to get the minimum dates of these per actionname. This means you want a second CTE building up on the first one:

    with max_per_group as
    (
      select top(1) with ties
        actionname,
        actiontime,
        actionby
      from actiondetails
      where action in (1, 2, 3, 4)
      order by row_number() 
                 over (partition by actionname, case when action <= 2 then 1 else 2 end
                       order by actiontime desc)
    )
    , min_of_max as
    (
      select top(1) with ties
        actionname,
        actiontime,
        actionby
      from max_per_group
      order by row_number() over (partition by actionname order by actiontime)
    )
    select actionname, actiontime, actionby
    from min_of_max
    order by actionname;
    

    As you see, instead of computing a row number and then have to limit rows based on that in the next query, I limit the rows right away by putting the row numbering into the ORDER BY clause and applying TOP(1) WITH TIES to get all rows numbered 1. I like this a tad better, because the CTE already produces the rows that I want to work with rather than only marking them in a bigger data set. But that's personal preference I guess.

    Discaimer:

    In my query I assume that the column action is numeric. If the column is a string instead, because it can hold values that are not numbers, then work with strings:

    where action in ('1', '2', '3', '4')
    
    partition by actionname, case when action in ('1', '2') then 1 else 2 end
    

    If on the other hand the column is a string, but there are only numbers in that column, fix your table instead.