Search code examples
sqloracle-databasegroup-byconditional-statements

SQL Oracle conditional group by


I am struggling with the following problem, any help would be appreciated! I need to take the summation of a variable but the level of the group by dependents on a condition (see 3.), this goes beyond my knowledge of SQL.

So my query should:

  1. Take the summation of AMOUNT per ID and SPLIT (see f.e. ID_1, 100 + 50)
  2. With CUST having the value of the most recent DATE (on a ID, SPLIT level) (see f.e. ID_1, A and B -> A)
  3. If CUST has the same DATE value (on a ID, SPLIT level) it should keep both values, and there should also be a split per CUST in AMOUNT (see f.e. ID_3, 30 and 10 + 10)

My fictional dataset looks like this (script to recreate see bottom):

ID SPLIT CUST DATE AMOUNT
ID_1 SPLIT_YES A 05/01/2024 100
ID_1 SPLIT_NO A 04/01/2024 200
ID_1 SPLIT_YES B 03/01/2024 50
ID_2 SPLIT_YES A 05/01/2024 50
ID_2 SPLIT_NO A 04/01/2024 300
ID_2 SPLIT_NO B 03/01/2024 300
ID_3 SPLIT_YES B 04/01/2024 90
ID_3 SPLIT_NO B 04/01/2024 30
ID_3 SPLIT_NO A 04/01/2024 10
ID_3 SPLIT_NO A 03/01/2024 10

And the final result of the query should be this:

ID SPLIT CUST DATE AMOUNT
ID_1 SPLIT_YES A 05/01/2024 150
ID_1 SPLIT_NO A 04/01/2024 200
ID_2 SPLIT_YES A 05/01/2024 50
ID_2 SPLIT_NO A 04/01/2024 600
ID_3 SPLIT_YES B 04/01/2024 90
ID_3 SPLIT_NO B 04/01/2024 30
ID_3 SPLIT_NO A 04/01/2024 20

Thank you all!

I tried using a lot of different with statements to do it step by step, with no correct result. Using a with statement I am able to solve step 1 and 2 combined by first taking the summation per ID and SPLIT and then join based with the most recent CUST VALUE, but this join is my issue since it makes step 3 impossible.

WITH LatestDatePerID AS (
    SELECT ID, 
    "SPLIT",
           MAX(DATE_COLUMN) AS MAX_DATE
    FROM your_table_name
    GROUP BY ID, "SPLIT"
),
LatestCustPerID AS (
    SELECT t.ID, 
           t.CUST, 
           t."SPLIT",
           t.DATE_COLUMN, 
           t.AMOUNT
    FROM your_table_name t
    JOIN LatestDatePerID l ON t.ID = l.ID AND t.DATE_COLUMN = l.MAX_DATE and t."SPLIT" = l."SPLIT"
)
SELECT ID, 
       CUST,
       "SPLIT",
       DATE_COLUMN, 
       SUM(AMOUNT) AS AMOUNT
FROM LatestCustPerID
GROUP BY ID, "SPLIT", CUST, DATE_COLUMN
ORDER BY ID, DATE_COLUMN DESC;

So it is not taking the summation of the other neglected rows. I am breaking my head over this.

Script to recreate table:

CREATE TABLE Test_Table_MM (
    ID VARCHAR2(10),
    SPLIT VARCHAR2(10),
    CUST VARCHAR2(10),
    DATE_COLUMN DATE,
    AMOUNT NUMBER
);

INSERT INTO Test_Table_MM (ID, SPLIT, CUST, DATE_COLUMN, AMOUNT) VALUES ('ID_1', 'SPLIT_YES', 'A', TO_DATE('05/01/2024', 'MM/DD/YYYY'), 100);
INSERT INTO Test_Table_MM (ID, SPLIT, CUST, DATE_COLUMN, AMOUNT) VALUES ('ID_1', 'SPLIT_NO', 'A', TO_DATE('04/01/2024', 'MM/DD/YYYY'), 200);
INSERT INTO Test_Table_MM (ID, SPLIT, CUST, DATE_COLUMN, AMOUNT) VALUES ('ID_1', 'SPLIT_YES', 'B', TO_DATE('03/01/2024', 'MM/DD/YYYY'), 50);
INSERT INTO Test_Table_MM (ID, SPLIT, CUST, DATE_COLUMN, AMOUNT) VALUES ('ID_2', 'SPLIT_YES', 'A', TO_DATE('05/01/2024', 'MM/DD/YYYY'), 50);
INSERT INTO Test_Table_MM (ID, SPLIT, CUST, DATE_COLUMN, AMOUNT) VALUES ('ID_2', 'SPLIT_NO', 'A', TO_DATE('04/01/2024', 'MM/DD/YYYY'), 300);
INSERT INTO Test_Table_MM (ID, SPLIT, CUST, DATE_COLUMN, AMOUNT) VALUES ('ID_2', 'SPLIT_NO', 'B', TO_DATE('03/01/2024', 'MM/DD/YYYY'), 300);
INSERT INTO Test_Table_MM (ID, SPLIT, CUST, DATE_COLUMN, AMOUNT) VALUES ('ID_3', 'SPLIT_YES', 'B', TO_DATE('04/01/2024', 'MM/DD/YYYY'), 90);
INSERT INTO Test_Table_MM (ID, SPLIT, CUST, DATE_COLUMN, AMOUNT) VALUES ('ID_3', 'SPLIT_NO', 'B', TO_DATE('04/01/2024', 'MM/DD/YYYY'), 30);
INSERT INTO Test_Table_MM (ID, SPLIT, CUST, DATE_COLUMN, AMOUNT) VALUES ('ID_3', 'SPLIT_NO', 'A', TO_DATE('04/01/2024', 'MM/DD/YYYY'), 10);
INSERT INTO Test_Table_MM (ID, SPLIT, CUST, DATE_COLUMN, AMOUNT) VALUES ('ID_3', 'SPLIT_NO', 'A', TO_DATE('03/01/2024', 'MM/DD/YYYY'), 10);

Solution

  • You can use analytic functions:

    SELECT id, 
           split,
           cust,
           date_column,
           CASE num_cust
           WHEN 1
           THEN total_amount
           ELSE total_cust_amount
           END AS amount
    FROM   (
      SELECT t.*,
             COUNT(DISTINCT CASE rnk WHEN 1 THEN cust END)
               OVER (PARTITION BY id, split) AS num_cust
      FROM   (
        SELECT t.*,
               DENSE_RANK() OVER (PARTITION BY id, split ORDER BY date_column DESC) AS rnk,
               SUM(amount) OVER (PARTITION BY id, split) AS total_amount,
               SUM(amount) OVER (PARTITION BY id, split, cust) AS total_cust_amount
        FROM   test_table_mm t
      ) t
      WHERE  rnk = 1
    )
    

    Which, for the sample data, outputs:

    ID SPLIT CUST DATE_COLUMN AMOUNT
    ID_1 SPLIT_NO A 2024-04-01 00:00:00 200
    ID_1 SPLIT_YES A 2024-05-01 00:00:00 150
    ID_2 SPLIT_NO A 2024-04-01 00:00:00 600
    ID_2 SPLIT_YES A 2024-05-01 00:00:00 50
    ID_3 SPLIT_NO A 2024-04-01 00:00:00 20
    ID_3 SPLIT_NO B 2024-04-01 00:00:00 30
    ID_3 SPLIT_YES B 2024-04-01 00:00:00 90

    fiddle