Search code examples
sqlsql-serverdatabasedatetimegreatest-n-per-group

SQL Query select top and bottom rows from every batch in the table


I have a bunch of data as below, which I have already extracted from the DB and already ordered.

ExecutionTime            BatchKey   Account   Instrument    UserName
2020-10-16 17:27:03.600     1        1234      ABOK        HornedFrogs
2020-10-16 17:27:03.600     1        EGO321    ABOK        HouseAccount
2020-10-16 21:14:04.483     1        EGO321    ABOK        HouseAccount
2020-10-16 21:14:04.483     1        1234      ABOK        BrokerAccount
2020-10-16 14:28:10.973     2        1098      ABOK        BaylorBears
2020-10-16 14:28:10.973     2        EGO321    ABOK        HouseAccount
2020-10-16 14:28:10.973     2        EGO321    ABOK        HouseAccount
2020-10-16 14:28:10.973     2        1234      ABOK        BrokerAccount
2020-10-16 21:14:04.473     2        EGO321    ABOK        BrokerAccount
2020-10-16 21:14:04.473     2        1234      ABOK        CustodianAccount
2020-10-16 12:26:21.503     3        1098      ABOK        SMUMustangs
2020-10-16 12:26:21.503     3        EGO321    ABOK        HouseAccount
2020-10-16 12:26:21.503     3        EGO321    ABOK        HouseAccount
2020-10-16 12:26:21.503     3        1234      ABOK        BrokerAccount
2020-10-16 21:14:04.377     3        EGO321    ABOK        BrokerAccount
2020-10-16 21:14:04.377     3        1234      ABOK        CustodianAccount
2020-10-16 07:14:04.411     4        1234      ABOK        BrokerAccount

The above data is organized by the BatchKey, so everything in a BatchKey is grouped together.

Each BatchKey is sorted by the ExecutionTime, so the earliest one is on the top, and the latest one is at the bottom.

I would like to use this as a smaller table, and extract the rows based on the below logic:

For every BatchKey:

-if the BatchKey has more than 2 entries, select the earliest (top) and the latest (bottom) entries in the batch

-if the BatchKey has exactly 2 entries, select them

-(else) if the BatchKey has only 1 entry, select it

The result should look like below:

The query should extract the below rows

ExecutionTime            BatchKey   Account   Instrument    UserName
2020-10-16 17:27:03.600     1        1234      ABOK        HornedFrogs
2020-10-16 21:14:04.483     1        1234      ABOK        BrokerAccount
2020-10-16 14:28:10.973     2        1098      ABOK        BaylorBears
2020-10-16 21:14:04.473     2        1234      ABOK        CustodianAccount
2020-10-16 12:26:21.503     3        1098      ABOK        SMUMustangs
2020-10-16 21:14:04.377     3        1234      ABOK        CustodianAccount
2020-10-16 07:14:04.411     4        1234      ABOK        BrokerAccount

Is there any way to run such query in SQL?


Solution

  • Use row_number():

    select *
    from (
        select t.*, 
            row_number() over(partition by batchkey order by executiontime) rn_asc,
            row_number() over(partition by batchkey order by executiontime desc) rn_desc
        from mytable t
    ) t
    where 1 in (rn_asc, rn_desc)
    

    The logic is to rank records having the same batchkey by ascending and descending executiontime, then to filter on the the top and bottom row numbers. If there is just one record for a given batchkey, it gets ranked 1 in both directions.