I have a bunch of data as below, which I have already extracted from the DB and already ordered.
ExecutionTime BatchKey Account Instrument UserName
2020-10-16 17:27:03.600 1 1234 ABOK HornedFrogs
2020-10-16 17:27:03.600 1 EGO321 ABOK HouseAccount
2020-10-16 21:14:04.483 1 EGO321 ABOK HouseAccount
2020-10-16 21:14:04.483 1 1234 ABOK BrokerAccount
2020-10-16 14:28:10.973 2 1098 ABOK BaylorBears
2020-10-16 14:28:10.973 2 EGO321 ABOK HouseAccount
2020-10-16 14:28:10.973 2 EGO321 ABOK HouseAccount
2020-10-16 14:28:10.973 2 1234 ABOK BrokerAccount
2020-10-16 21:14:04.473 2 EGO321 ABOK BrokerAccount
2020-10-16 21:14:04.473 2 1234 ABOK CustodianAccount
2020-10-16 12:26:21.503 3 1098 ABOK SMUMustangs
2020-10-16 12:26:21.503 3 EGO321 ABOK HouseAccount
2020-10-16 12:26:21.503 3 EGO321 ABOK HouseAccount
2020-10-16 12:26:21.503 3 1234 ABOK BrokerAccount
2020-10-16 21:14:04.377 3 EGO321 ABOK BrokerAccount
2020-10-16 21:14:04.377 3 1234 ABOK CustodianAccount
2020-10-16 07:14:04.411 4 1234 ABOK BrokerAccount
The above data is organized by the BatchKey, so everything in a BatchKey is grouped together.
Each BatchKey is sorted by the ExecutionTime, so the earliest one is on the top, and the latest one is at the bottom.
I would like to use this as a smaller table, and extract the rows based on the below logic:
For every BatchKey:
-if the BatchKey has more than 2 entries, select the earliest (top) and the latest (bottom) entries in the batch
-if the BatchKey has exactly 2 entries, select them
-(else) if the BatchKey has only 1 entry, select it
The result should look like below:
The query should extract the below rows
ExecutionTime BatchKey Account Instrument UserName
2020-10-16 17:27:03.600 1 1234 ABOK HornedFrogs
2020-10-16 21:14:04.483 1 1234 ABOK BrokerAccount
2020-10-16 14:28:10.973 2 1098 ABOK BaylorBears
2020-10-16 21:14:04.473 2 1234 ABOK CustodianAccount
2020-10-16 12:26:21.503 3 1098 ABOK SMUMustangs
2020-10-16 21:14:04.377 3 1234 ABOK CustodianAccount
2020-10-16 07:14:04.411 4 1234 ABOK BrokerAccount
Is there any way to run such query in SQL?
Use row_number()
:
select *
from (
select t.*,
row_number() over(partition by batchkey order by executiontime) rn_asc,
row_number() over(partition by batchkey order by executiontime desc) rn_desc
from mytable t
) t
where 1 in (rn_asc, rn_desc)
The logic is to rank records having the same batchkey
by ascending and descending executiontime
, then to filter on the the top and bottom row numbers. If there is just one record for a given batchkey
, it gets ranked 1
in both directions.