I am struggling with the following problem, any help would be appreciated! I need to take the summation of a variable but the level of the group by dependents on a condition (see 3.), this goes beyond my knowledge of SQL.
So my query should:
My fictional dataset looks like this (script to recreate see bottom):
ID | SPLIT | CUST | DATE | AMOUNT |
---|---|---|---|---|
ID_1 | SPLIT_YES | A | 05/01/2024 | 100 |
ID_1 | SPLIT_NO | A | 04/01/2024 | 200 |
ID_1 | SPLIT_YES | B | 03/01/2024 | 50 |
ID_2 | SPLIT_YES | A | 05/01/2024 | 50 |
ID_2 | SPLIT_NO | A | 04/01/2024 | 300 |
ID_2 | SPLIT_NO | B | 03/01/2024 | 300 |
ID_3 | SPLIT_YES | B | 04/01/2024 | 90 |
ID_3 | SPLIT_NO | B | 04/01/2024 | 30 |
ID_3 | SPLIT_NO | A | 04/01/2024 | 10 |
ID_3 | SPLIT_NO | A | 03/01/2024 | 10 |
And the final result of the query should be this:
ID | SPLIT | CUST | DATE | AMOUNT |
---|---|---|---|---|
ID_1 | SPLIT_YES | A | 05/01/2024 | 150 |
ID_1 | SPLIT_NO | A | 04/01/2024 | 200 |
ID_2 | SPLIT_YES | A | 05/01/2024 | 50 |
ID_2 | SPLIT_NO | A | 04/01/2024 | 600 |
ID_3 | SPLIT_YES | B | 04/01/2024 | 90 |
ID_3 | SPLIT_NO | B | 04/01/2024 | 30 |
ID_3 | SPLIT_NO | A | 04/01/2024 | 20 |
Thank you all!
I tried using a lot of different with statements to do it step by step, with no correct result. Using a with statement I am able to solve step 1 and 2 combined by first taking the summation per ID and SPLIT and then join based with the most recent CUST VALUE, but this join is my issue since it makes step 3 impossible.
WITH LatestDatePerID AS (
SELECT ID,
"SPLIT",
MAX(DATE_COLUMN) AS MAX_DATE
FROM your_table_name
GROUP BY ID, "SPLIT"
),
LatestCustPerID AS (
SELECT t.ID,
t.CUST,
t."SPLIT",
t.DATE_COLUMN,
t.AMOUNT
FROM your_table_name t
JOIN LatestDatePerID l ON t.ID = l.ID AND t.DATE_COLUMN = l.MAX_DATE and t."SPLIT" = l."SPLIT"
)
SELECT ID,
CUST,
"SPLIT",
DATE_COLUMN,
SUM(AMOUNT) AS AMOUNT
FROM LatestCustPerID
GROUP BY ID, "SPLIT", CUST, DATE_COLUMN
ORDER BY ID, DATE_COLUMN DESC;
So it is not taking the summation of the other neglected rows. I am breaking my head over this.
Script to recreate table:
CREATE TABLE Test_Table_MM (
ID VARCHAR2(10),
SPLIT VARCHAR2(10),
CUST VARCHAR2(10),
DATE_COLUMN DATE,
AMOUNT NUMBER
);
INSERT INTO Test_Table_MM (ID, SPLIT, CUST, DATE_COLUMN, AMOUNT) VALUES ('ID_1', 'SPLIT_YES', 'A', TO_DATE('05/01/2024', 'MM/DD/YYYY'), 100);
INSERT INTO Test_Table_MM (ID, SPLIT, CUST, DATE_COLUMN, AMOUNT) VALUES ('ID_1', 'SPLIT_NO', 'A', TO_DATE('04/01/2024', 'MM/DD/YYYY'), 200);
INSERT INTO Test_Table_MM (ID, SPLIT, CUST, DATE_COLUMN, AMOUNT) VALUES ('ID_1', 'SPLIT_YES', 'B', TO_DATE('03/01/2024', 'MM/DD/YYYY'), 50);
INSERT INTO Test_Table_MM (ID, SPLIT, CUST, DATE_COLUMN, AMOUNT) VALUES ('ID_2', 'SPLIT_YES', 'A', TO_DATE('05/01/2024', 'MM/DD/YYYY'), 50);
INSERT INTO Test_Table_MM (ID, SPLIT, CUST, DATE_COLUMN, AMOUNT) VALUES ('ID_2', 'SPLIT_NO', 'A', TO_DATE('04/01/2024', 'MM/DD/YYYY'), 300);
INSERT INTO Test_Table_MM (ID, SPLIT, CUST, DATE_COLUMN, AMOUNT) VALUES ('ID_2', 'SPLIT_NO', 'B', TO_DATE('03/01/2024', 'MM/DD/YYYY'), 300);
INSERT INTO Test_Table_MM (ID, SPLIT, CUST, DATE_COLUMN, AMOUNT) VALUES ('ID_3', 'SPLIT_YES', 'B', TO_DATE('04/01/2024', 'MM/DD/YYYY'), 90);
INSERT INTO Test_Table_MM (ID, SPLIT, CUST, DATE_COLUMN, AMOUNT) VALUES ('ID_3', 'SPLIT_NO', 'B', TO_DATE('04/01/2024', 'MM/DD/YYYY'), 30);
INSERT INTO Test_Table_MM (ID, SPLIT, CUST, DATE_COLUMN, AMOUNT) VALUES ('ID_3', 'SPLIT_NO', 'A', TO_DATE('04/01/2024', 'MM/DD/YYYY'), 10);
INSERT INTO Test_Table_MM (ID, SPLIT, CUST, DATE_COLUMN, AMOUNT) VALUES ('ID_3', 'SPLIT_NO', 'A', TO_DATE('03/01/2024', 'MM/DD/YYYY'), 10);
You can use analytic functions:
SELECT id,
split,
cust,
date_column,
CASE num_cust
WHEN 1
THEN total_amount
ELSE total_cust_amount
END AS amount
FROM (
SELECT t.*,
COUNT(DISTINCT CASE rnk WHEN 1 THEN cust END)
OVER (PARTITION BY id, split) AS num_cust
FROM (
SELECT t.*,
DENSE_RANK() OVER (PARTITION BY id, split ORDER BY date_column DESC) AS rnk,
SUM(amount) OVER (PARTITION BY id, split) AS total_amount,
SUM(amount) OVER (PARTITION BY id, split, cust) AS total_cust_amount
FROM test_table_mm t
) t
WHERE rnk = 1
)
Which, for the sample data, outputs:
ID | SPLIT | CUST | DATE_COLUMN | AMOUNT |
---|---|---|---|---|
ID_1 | SPLIT_NO | A | 2024-04-01 00:00:00 | 200 |
ID_1 | SPLIT_YES | A | 2024-05-01 00:00:00 | 150 |
ID_2 | SPLIT_NO | A | 2024-04-01 00:00:00 | 600 |
ID_2 | SPLIT_YES | A | 2024-05-01 00:00:00 | 50 |
ID_3 | SPLIT_NO | A | 2024-04-01 00:00:00 | 20 |
ID_3 | SPLIT_NO | B | 2024-04-01 00:00:00 | 30 |
ID_3 | SPLIT_YES | B | 2024-04-01 00:00:00 | 90 |