I want to filter a table eliminating the rows that give zero in a sum over an "amount" column and that are grouped by an Id, for example, if I have "-10, 10 and 15" for the same Id. I want to eliminate the rows that have -10 and 10 and stay with row 15
So far I was able to filter the table using GROUP BY
but I need to get the ID to do a select again on the same table, also I need the SUM
split.
CREATE Table #myData
(
Id INT,
DateInit date,
DebtorName varchar(10),
Trancode varchar(10),
Amount VARCHAR(50),
DebtorID varchar(50),
ClientRef NVARCHAR(256)
)
INSERT INTO #myData
VALUES
(1, '1/1/2022', 'BURNETTE', '13 P', '-53.42', '23916762', 'T081819513107-00009')
,(2, '1/1/2022', 'BURNETTE', '13 P', '-28.83', '23916761', 'T081819513107-00008')
,(3, '1/1/2022', 'BURNETTE', '13 P', '-12.08', '23916764', 'T081819513107-00010')
,(4, '1/1/2022', 'BURNETTE', '1 P', '12.08', '23916764', 'T081819513107-00010')
,(5, '1/1/2022', 'BURNETTE', '1 P', '28.83', '23916761', 'T081819513107-00008')
,(6, '1/1/2022', 'BURNETTE', '1 P', '53.42', '23916762', 'T081819513107-00009')
,(7, '1/1/2022', 'BURNETTE', '2 P', '66.41', '23916762', 'T081819513107-00009')
,(8, '1/1/2022', 'BURNETTE', '2 P', '71.10', '23916762', 'T081819513107-00009')
,(9, '1/1/2022', 'BURNETTE', '2 P', '12.08', '23916764', 'T081819513107-00010')
SELECT
DebtorID,
SUM(CASE
WHEN TRY_PARSE(A.Amount AS float) IS NULL
THEN (-1) * CAST(SUBSTRING(A.Amount, CHARINDEX('-', A.amount) + 1, LEN(A.Amount)) AS float)
ELSE CAST(A.Amount AS float)
END) AS amountSum
FROM
#myData A
GROUP BY
DebtorID
HAVING
SUM(CASE
WHEN TRY_PARSE(A.Amount AS float) IS NULL
THEN (-1) * CAST(SUBSTRING(A.Amount, CHARINDEX('-', A.Amount) + 1, LEN(A.Amount)) AS float)
ELSE CAST(A.Amount AS float)
END) != 0
But this query returns:
DebtorID | amountSum |
---|---|
23916762 | 137.51 |
23916764 | 12.08 |
The ultimate goal is to filter these results and save them in another table identical to the original (#myData
).
This is what I want to achieve for input #myData :
Id | DateInit | DebtorName | Trancode | amount | DebtorID | ClientRef |
---|---|---|---|---|---|---|
6 | 1/1/2022 | BURNETTE | 2 P | 66.41 | 23916762 | T081819513107-00009 |
8 | 1/1/2022 | BURNETTE | 2 p | 71.10 | 23916762 | T081819513107-00009 |
8 | 1/1/2022 | BURNETTE | 2 p | 12.08 | 23916764 | T081819513107-00010 |
I would appreciate any help, thanks!
thanks for the help, but this scenario has to have the same output.
Id | DateInit | DebtorName | Trancode | amount | DebtorID | ClientRef |
---|---|---|---|---|---|---|
1 | 1/1/2022 | BURNETTE | 13 P | -53.42 | 23916762 | T081819513107-00009 |
2 | 1/1/2022 | BURNETTE | 13 P | -12.08 | 23916764 | T081819513107-00010 |
3 | 1/1/2022 | BURNETTE | 1 p | 12.08 | 23916764 | T081819513107-00010 |
4 | 1/1/2022 | BURNETTE | 1 p | 28.83 | 23916761 | T081819513107-00008 |
5 | 1/1/2022 | BURNETTE | 1 p | 66.41 | 23916762 | T081819513107-00009 |
6 | 1/1/2022 | BURNETTE | 13 p | -28.83 | 23916761 | T081819513107-00008 |
7 | 1/1/2022 | BURNETTE | 1 p | 53.42 | 23916762 | T081819513107-00009 |
8 | 1/1/2022 | BURNETTE | 1 p | 71.10 | 23916762 | T081819513107-00009 |
9 | 1/1/2022 | BURNETTE | 2 p | 12.08 | 23916764 | T081819513107-00010 |
the expected result output:
Id | DateInit | DebtorName | Trancode | amount | DebtorID | ClientRef |
---|---|---|---|---|---|---|
5 | 1/1/2022 | BURNETTE | 1 p | 66.41 | 23916762 | T081819513107-00009 |
8 | 1/1/2022 | BURNETTE | 1 p | 71.10 | 23916762 | T081819513107-00009 |
9 | 1/1/2022 | BURNETTE | 2 p | 12.08 | 23916764 | T081819513107-00010 |
If you need the original data set but want to include an aggregate or want to filter or sort based on an aggregate then Window Functions can help you out a lot.
UPDATE: The nature of the question has changed slightly, however the original advice is still the same, we don't want to use
GROUP BY
because we lose the detail rows, if we must do this in a sing expression then Window Functions provide a simple syntax solution
In this solution I've used a CTE to separate out the formatting of the amount column and the calculation of the aggregate, but you could have used CROSS APPLY
for the formatting.
Notice in the solution we are NOT applying a
SUM
at all, we are merely matching values, this means the same solution with work for non--numeric values
In the fiddle I've tried to detail some of the steps but ultimately, we process the data in the following sequence:
Amount
column into a numeric valueAmount
and DebtorID
values using ROW_NUMBER()
window functionWITH FormattedData as (
SELECT ID, DebtorID, COALESCE(TRY_PARSE(amount as float)
, (-1)*CAST(SUBSTRING(amount,CHARINDEX('-',amount)+1,LEN(amount)) AS float)
) as Amount
FROM #myData
)
, DataSequenced as (
SELECT a.ID, a.DebtorID, a.Amount, ROW_NUMBER() OVER (PARTITION BY a.DebtorID, a.Amount ORDER BY a.ID) as RN
FROM FormattedData a
)
, MatchesRemoved as (
SELECT a.ID, a.DebtorID, A.Amount
FROM DataSequenced a
WHERE NOT EXISTS (SELECT lkp.ID
FROM DataSequenced lkp
WHERE lkp.DebtorID = a.DebtorID
AND lkp.Amount = -a.Amount
AND lkp.RN = a.RN)
)
SELECT myData.ID, DateInit, DebtorName, Trancode, a.Amount, myData.DebtorID
, myData.ClientRef
FROM MatchesRemoved a
INNER JOIN #myData myData ON myData.ID = a.ID
ORDER BY ID
See it in this fiddle: https://dbfiddle.uk/XY8h2uvG
Because we are not specifically applying an aggregate on the Amount
column, it doesn't need to be numeric, we can actually use simple string comparisons. You will have to profile this in your environment, but the following query is similar:
WITH DataSequenced as (
SELECT a.ID, a.DebtorID, a.Amount
, ROW_NUMBER() OVER (PARTITION BY a.DebtorID, a.Amount ORDER BY a.ID) as RN
, CASE
WHEN LEFT(a.Amount,1) = '-' THEN RTRIM(RIGHT(a.Amount, LEN(a.Amount) - 1))
ELSE RTRIM(a.Amount)
END as AbsAmount
FROM #myData a
)
, MatchesRemoved as (
SELECT a.ID, a.DebtorID, A.Amount
FROM DataSequenced a
WHERE NOT EXISTS (SELECT lkp.ID
FROM DataSequenced lkp
WHERE lkp.DebtorID = a.DebtorID
AND lkp.AbsAmount = a.AbsAmount
AND lkp.ID <> a.ID
AND lkp.RN = a.RN)
)
SELECT myData.ID, DateInit, DebtorName, Trancode, a.Amount, myData.DebtorID
, myData.ClientRef
FROM MatchesRemoved a
INNER JOIN #myData myData ON myData.ID = a.ID
ORDER BY ID
Depending on the complexity of your datasource and the indexes that might be available (or NOT available) It might make more sense to process this in multiple queries against temporary tables.
The following are previous attempts to address the original request. I choose to leave these in due to the ambiguity of the OP title, the though process and solutions (including the fiddles) might prove helpful to others who find themselves on this page.
Find the transactions where the SUM of the Debtor is ZERO : This was interpreted as return all transactions for Debtors who do not have a ZERO balance.
WITH FormattedData as (
SELECT ID, DebtorID, COALESCE(TRY_PARSE(amount as float)
, (-1)*CAST(SUBSTRING(amount,CHARINDEX('-',amount)+1,LEN(amount)) AS float)
) as Amount
FROM #myData
)
, DataWithSum as (
SELECT ID, DebtorID, Amount, SUM(Amount) OVER( Partition By DebtorID) AS SumAmount
FROM FormattedData
)
SELECT myData.ID, myData.DebtorID, D.Amount, D.SumAmount
, myData.ClientRef
FROM DataWithSum D
-- join back if you need additional columns
INNER JOIN #myData myData ON myData.ID = D.ID
WHERE SumAmount > 0
ID | DebtorID | Amount | SumAmount | ClientRef |
---|---|---|---|---|
1 | 23916762 | -53.42 | 137.51 | T081819513107-00009 |
3 | 23916764 | -12.08 | 12.08 | T081819513107-00010 |
4 | 23916764 | 12.08 | 12.08 | T081819513107-00010 |
6 | 23916762 | 53.42 | 137.51 | T081819513107-00009 |
7 | 23916762 | 66.41 | 137.51 | T081819513107-00009 |
8 | 23916762 | 71.1 | 137.51 | T081819513107-00009 |
9 | 23916764 | 12.08 | 12.08 | T081819513107-00010 |
You could also have joined the result of your GROUP BY
back onto the original set, if your HAVING
clause was going to remove a LOT of records, then this can be more efficient than the Window Function
If we need to balance out the transactions, then a simple solution is that we can compute the running balance for all previous rows with the same DebtorID
. In this fiddle you can see the working out: https://dbfiddle.uk/DVRoMoNd
I turns out that even though the previous fiddle returned the correct answer, it wouldn't work if the transaction history had previously gone into negatives and then back to zero. The following query finds the last time that the running balance was less than or equal to zero and returns all the transactions after that: https://dbfiddle.uk/exwm9OwA
-- if we only want the transactions since the last time the account was in balance
WITH FormattedData as (
SELECT ID, DebtorID, COALESCE(TRY_PARSE(amount as float)
, (-1)*CAST(SUBSTRING(amount,CHARINDEX('-',amount)+1,LEN(amount)) AS float)
) as Amount
FROM #myData
)
, DataWithRunningBalance as (
SELECT a.ID, a.DebtorID, a.Amount, (
SELECT SUM(Amount)
FROM FormattedData lkp
WHERE lkp.DebtorID = a.DebtorID
AND lkp.ID <= a.ID
) AS SumAmount
FROM FormattedData a
)
SELECT myData.ID, myData.DebtorID, A.Amount
, myData.ClientRef
FROM FormattedData A
INNER JOIN #myData myData ON myData.ID = A.ID
WHERE A.ID > (
SELECT TOP 1 ID
FROM DataWithRunningBalance B
WHERE B.DebtorID = myData.DebtorID
AND SumAmount <= 0
ORDER BY ID DESC
)
ORDER BY ID
ID | DebtorID | Amount | ClientRef |
---|---|---|---|
7 | 23916762 | 66.41 | T081819513107-00009 |
8 | 23916762 | 71.1 | T081819513107-00009 |
9 | 23916764 | 12.08 | T081819513107-00010 |
The sequence of the data is important to the SumAmount
column, in this case I couldn't use the timestamp because there was not enough granularity in the values, ID
was a natural fit for this dataset. In your production code there might be a a more suitable time based candidate.
The order of the negative vs positive values does not matter, SumAmount
is a simple running balance with respect to time of entry.