Search code examples
sqlsql-serverfiltersumoffset

Query to filter out transactions with exact offset amount


The query below is a rough draft containing the relevant columns in the query I'd like to write, so don't look at it as a solution. Use it as a guide for the table and column names. I'm trying to remove any transactions that offset each other for the same ORDER_ID and ACCOUNT_ID. I don't think I can do an aggregate using SUM, since that would add together all TX_AMOUNT values for the grouping. See TX_ID 6 and 7. Those both need to show in the result set. How can I output the TX_ID from the table below, and filter out anything that doesn't say "SHOW THIS"?

SELECT 
T1.ACCOUNT_ID
T1.ORDER_ID,
T1.TX_ID
FROM TRANSACTION AS T1
WHERE
T1.ACCOUNT_ID IN (
SELECT T2.ACCOUNT_ID
FROM TRANSACTION AS T2
GROUP BY T2.ACCOUNT_ID, T2.ORDER_ID
HAVING SUM(T2.TX_AMOUNT) != 0 AND T2.ORDER_ID IS NOT NULL 
)
AND T1.ORDER_ID IN (
SELECT T3.ORDER_ID
FROM TRANSACTION AS T3
GROUP BY T3.ACCOUNT_ID, T3.ORDER_ID
HAVING SUM(T3.TX_AMOUNT) != 0 AND T3.ORDER_ID IS NOT NULL 
)

TX_ID   ORDER_ID ACCOUNT_ID  TX_AMOUNT
------------------------------------
1       A1       200         -3.00  <--------- DON'T SHOW THIS; OFFSET BY #2
2       A1       200         3.00   <--------- DON'T SHOW THIS; OFFSET BY #1
3       A1       200         3.00   <--------- SHOW THIS
4       A2       999         -10.01 <--------- DON'T SHOW THIS; OFFSET BY #5
5       A2       999         10.01  <--------- DON'T SHOW THIS; OFFSET BY #4
6       A2       999         10.01  <--------- SHOW THIS
7       A2       999         5.02   <--------- SHOW THIS

Solution

  • VERSION 2: MUCH cleaner... Working DEMO with comments (you may need to click Run it!) to see desired results (or maybe I have a caching problem)

    • The CTE (Common Table Expression) is just your data setup that you show
    • The CTE2 simply adds a row number partitioned by the tx_amount, order_Id, account_Id. Key here is the fact that we get a row_number for each order_ID, Account_ID and tax_Amount restarting when those 3 values change but incrementing when they stay the same. This later allows us to exclude like matches on opposite tx_amounts without eliminating those when one side has more than the other (your $3.00 example)
    • The Select simply pulls in records from the base set where the opposite value with the same row number order_id and account exist. if one doesn't then we know it's a value w/o a matching opposite tx_Amount and thus one we want to keep.
    • Ask if you have questions! Happy to help if something's not clear
    • Lastly if we change CTE2 so the rowNumber() is ordered by tx_ID asc instead of tx_Amount desc (which really serves no purpose other than I needed an order by on the row_nubmer), then we'd get rid of the lowest number matches first consistently following the FIFO approach)

    .

    With CTE (TX_ID,   ORDER_ID, ACCOUNT_ID,  TX_AMOUNT) as (
    
    SELECT 1,       'A1',       200,         -3.00  UNION ALL
    SELECT 2,       'A1',       200,         3.00   UNION ALL
    SELECT 3,       'A1',       200,         3.00   UNION ALL
    SELECT 4,       'A2',       999,         -10.01 UNION ALL
    SELECT 5,       'A2',       999,         10.01  UNION ALL
    SELECT 6,       'A2',       999,         10.01  UNION ALL
    SELECT 7,       'A2',       999,         5.02 ),
    
    cte2 as (
    SELECT A.*, row_number() over (partition by order_ID, Account_ID, Tx_Amount order by tx_Amount desc) RN
    FROM cte A)
    
    SELECT * 
    FROM cte2 A
    WHERE NOT exists (SELECT * 
                      FROM cte2 B
                      WHERE A.Order_ID = B.Order_ID
                        and A.Account_ID = B.Account_Id
                        and A.tx_Amount*-1 = B.tx_Amount
                        and A.RN = B.RN)
    

    Giving us: (note we should eliminate the RN by changing * to desired fields but i'm too lazy at this point)

    +----+-------+----------+------------+-----------+----+
    |    | TX_ID | ORDER_ID | ACCOUNT_ID | TX_AMOUNT | RN |
    +----+-------+----------+------------+-----------+----+
    |  1 |     2 | A1       |        200 |      3,00 |  2 |
    |  2 |     7 | A2       |        999 |      5,02 |  1 |
    |  3 |     5 | A2       |        999 |     10,01 |  2 |
    +----+-------+----------+------------+-----------+----+
    

    VERSION 1: (Scratch this ugly; I mean seriously; who thinks like this?) I do...

    1. Do something. (version 1)
    2. Laugh(an important and overlooked step)
    3. Then do it right (See version 2 above)
    4. Now make it better. (indexes, tweak joins spelling, layout comments, use the right order by on the row_number logic in CTE2)

    DEMO

    With CTE (TX_ID,   ORDER_ID, ACCOUNT_ID,  TX_AMOUNT) as (
    
    SELECT 1,       'A1',       200,         -3.00  UNION ALL
    SELECT 2,       'A1',       200,         3.00   UNION ALL
    SELECT 3,       'A1',       200,         3.00   UNION ALL
    SELECT 4,       'A2',       999,         -10.01 UNION ALL
    SELECT 5,       'A2',       999,         10.01  UNION ALL
    SELECT 6,       'A2',       999,         10.01  UNION ALL
    SELECT 7,       'A2',       999,         5.02 ),
    cte2 as (
    SELECT * 
    FROM (Select A.Tx_Id aTx_ID
               , A.order_ID as AOrderID
               , A.Account_ID as AAccount_ID
               , A.tx_Amount as ATx_Amount
               , Row_number() over (partition by Order_ID, Account_ID, tx_Amount order by tx_Amount asc) ARN
    
          from cte a 
          WHERE tx_Amount <=0) A
    FULL OUTER JOIN (SELECT b.tx_Id
                          , b.order_Id
                          , b.Account_Id
                          , b.tx_Amount
                          ,  Row_number() over (partition by Order_ID, Account_ID, tx_Amount order by tx_Amount desc) BRN 
                     FROM  CTE B 
                     WHERE  tx_Amount>0) B
      on A.AOrderID = B.Order_ID
     and A.AAccount_ID = B.Account_ID
     and A.ATx_Amount*-1 = B.tx_Amount
     and A.ARN=B.BRN
    Where a.Atx_ID is null
      or B.tx_ID is null)
    
      Select ATX_ID, AORDERID, AAccount_ID, ATX_AMOUNT from cte2 where ATX_ID is not null
      UNION ALL
      Select TX_ID, ORDER_ID, Account_ID, TX_AMOUNT from cte2 where TX_ID is not null