Search code examples
postgresqlduplicatesleft-join

Postgres SQL: getting rows with a column value that has changed between consecutive dates


I want to gather rows where a ticker value has been added or deleted compared to the previous date. I succesfully got the "added column value" rows using LEFT JOIN, but am struggling to get the deleted ones. The current query runs in about 10 seconds. It selects a particular ETF/Index ("index_of_ref"), and look for added/deleted components ("ticker") between consecutive dates ("trade_date"):

SELECT 
    current_day.trade_date,
    current_day.ticker,
    CASE
        WHEN previous_day.ticker IS NULL THEN '+' -- Ticker added
        WHEN current_day.ticker IS NULL THEN '-' -- Ticker deleted
    END AS change_type
FROM
    t_etf_holdings AS current_day
LEFT JOIN
    t_etf_holdings AS previous_day 
    ON current_day.ticker = previous_day.ticker
    AND current_day.index_of_ref = previous_day.index_of_ref
    AND previous_day.trade_date = (
        SELECT MAX(trade_date)
        FROM t_etf_holdings
        WHERE trade_date < current_day.trade_date
        AND index_of_ref = 'ftse100'
--        AND ticker = current_day.ticker
    )
WHERE
    current_day.index_of_ref = 'ftse100'
AND previous_day.ticker IS NULL
--    AND (
--        (current_day.ticker IS NOT NULL AND previous_day.ticker IS NULL)
--        OR 
--      (current_day.ticker IS NULL AND previous_day.ticker IS NOT NULL)
--    )
ORDER BY
    current_day.trade_date DESC, current_day.ticker;

I believe my LEFT JOIN table logic is wrong and prevents to ever get the deleted records. Any help would be apreciated. I am looking to get any tickers OUT of the INNER JOIN. FULL OUTER JOIN, didn't work the way I used it...

I made a working example following advice of @bnk here: fiddle with inserted data results is:

trade_date  ticker  change_type
2024-01-05  GLEN    +
2024-01-04  AAF +
2024-01-04  AAL +
2024-01-04  WPP +

I would like:

trade_date  ticker  change_type
2024-01-06  GLEN    -
2024-01-05  GLEN    +
2024-01-05  WPP     -
2024-01-04  AAF     +
2024-01-04  AAL     +
2024-01-04  WPP     +

Here is how to reproduce on psql12:

CREATE stmt

 CREATE TABLE IF NOT EXISTS public.t_etf_holdings
(
    trade_date date NOT NULL,
    index_of_ref character varying(25) COLLATE pg_catalog."default" NOT NULL,
    ticker character varying(25) COLLATE pg_catalog."default" NOT NULL,
    CONSTRAINT t_etf_holdings_pkey PRIMARY KEY (trade_date, index_of_ref, ticker)
);

INSERT stmt

INSERT INTO t_etf_holdings (trade_date, index_of_ref, ticker)
VALUES('2024-01-06', 'ftse100', 'AAF');
INSERT INTO t_etf_holdings (trade_date, index_of_ref, ticker)
VALUES('2024-01-06', 'ftse100', 'AAL');
INSERT INTO t_etf_holdings (trade_date, index_of_ref, ticker)
VALUES('2024-01-05', 'ftse100', 'AAF');
INSERT INTO t_etf_holdings (trade_date, index_of_ref, ticker)
VALUES('2024-01-05', 'ftse100', 'AAL');
INSERT INTO t_etf_holdings (trade_date, index_of_ref, ticker)
VALUES('2024-01-05', 'ftse100', 'GLEN');
INSERT INTO t_etf_holdings (trade_date, index_of_ref, ticker)
VALUES('2024-01-04', 'ftse100', 'AAF');
INSERT INTO t_etf_holdings (trade_date, index_of_ref, ticker)
VALUES('2024-01-04', 'ftse100', 'AAL');
INSERT INTO t_etf_holdings (trade_date, index_of_ref, ticker)
VALUES('2024-01-04', 'ftse100', 'WPP');

See the feedle image: https://dbfiddle.uk/bE7BB7PB


Solution

  • You have to use FULL OUTER JOIN and conditional logic to identify and mark tickers as 'added' or 'deleted' between consecutive trade dates in an ETF/Index dataset.

    WITH PreviousDay AS (
        SELECT trade_date, index_of_ref, ticker
        FROM t_etf_holdings
        WHERE index_of_ref = 'ftse100'
    ), CurrentDay AS (
        SELECT trade_date, index_of_ref, ticker
        FROM t_etf_holdings
        WHERE index_of_ref = 'ftse100'
    )
    SELECT 
        COALESCE(cd.trade_date, pd.trade_date) AS trade_date,
        COALESCE(cd.ticker, pd.ticker) AS ticker,
        CASE
            WHEN pd.ticker IS NULL THEN '+'
            WHEN cd.ticker IS NULL THEN '-'
        END AS change_type
    FROM CurrentDay cd
    FULL OUTER JOIN PreviousDay pd
    ON cd.ticker = pd.ticker
    AND cd.index_of_ref = pd.index_of_ref
    AND pd.trade_date = (
        SELECT MAX(trade_date)
        FROM t_etf_holdings
        WHERE trade_date < cd.trade_date
        AND index_of_ref = 'ftse100'
    )
    WHERE cd.trade_date IS NULL OR pd.trade_date IS NULL
    ORDER BY trade_date DESC, ticker;
    

    The result would be

    trade_date  ticker  change_type
    2024-01-06  AAF -
    2024-01-06  AAL -
    2024-01-05  GLEN    +
    2024-01-05  GLEN    -
    2024-01-04  AAF +
    2024-01-04  AAL +
    2024-01-04  WPP +
    2024-01-04  WPP -