Search code examples
sqlpostgresqlcommon-table-expressionwindow-functions

Convert table data into a ledger format using PostgreSQL query


I have the following data in a PostgreSQL table:

id amount opening_amount closing_amount
1 200 1000 1200
2 -500 NULL NULL
3 -200 NULL NULL
4 1000 NULL NULL

I want to convert this data into a ledger format, calculating the opening_amount and closing_amount based on the previous row's closing_amount and the amount.

The expected result is:

id amount opening_amount closing_amount
1 200 1000 1200
2 -500 1200 700
3 -200 700 500
4 1000 500 1500

I tried using the LAG() function but it gives the wrong result:

id amount opening_amount closing_amount
1 200 1000 1200
2 -500 1200 700
3 -200 1000 800
4 1000 1000 2000

Here is the query I tried:

WITH ledger_data AS (
    SELECT 
        id,
        amount,
        COALESCE(LAG(closing_amount) OVER (ORDER BY id), 1000) AS opening_amount,  -- Assuming first record's opening balance is 1000
        amount + COALESCE(LAG(closing_amount) OVER (ORDER BY id), 1000) AS closing_amount
    FROM (VALUES
        (1, 200, 1000, 1200),
        (2, -500, NULL, NULL),
        (3, -200, NULL, NULL),
        (4, 1000, NULL, NULL)
    ) AS ledger(id, amount, opening_amount, closing_amount)
)
SELECT 
    id,
    amount,
    opening_amount,
    closing_amount
FROM ledger_data;

Issue: The LAG() function seems to be giving incorrect results for subsequent rows. I need to calculate the opening_amount and closing_amount in a cumulative way, where each row depends on the previous row's closing balance.

Can anyone suggest how I can fix this query to correctly generate the ledger format?


Solution

  • As you've found out, an analytic function can't include it's own result in a "subsequent" row. This is because SQL expressions must be able to be evaluated on any row independent of any other row. (SQL sets have no order, and so there's no "subsequent" row.)

    You could use recursive CTEs to iterate over each row one at a time, but that would be cumbersome and (relatively) slow.

    So, instead, keep a running total of each amount, and then do the maths.

    WITH
      ledger_data (
        id, amount, opening_amount, closing_amount
      )
    AS
    (
      VALUES
        (1,  200, 1000, 1200),
        (2, -500, NULL, NULL),
        (3, -200, NULL, NULL),
        (4, 1000, NULL, NULL)
    ),
      running_total AS
    (
      SELECT 
        id,
        amount,
        SUM(opening_amount)
          OVER (ORDER BY id)
            AS opening_amount,
        SUM(amount)
          OVER (ORDER BY id)
            AS cumulative_amount 
      FROM
          ledger_data
    )
    SELECT
      id,
      amount,
      opening_amount + cumulative_amount - amount   AS opening_amount,
      opening_amount + cumulative_amount            AS closing_amount
    FROM
      running_total
    ORDER BY
      id
    ;
    
    id amount opening_amount closing_amount
    1 200 1000 1200
    2 -500 1200 700
    3 -200 700 500
    4 1000 500 1500
    SELECT 4
    

    fiddle