I want to gather rows where a ticker value has been added or deleted compared to the previous date. I succesfully got the "added column value" rows using LEFT JOIN, but am struggling to get the deleted ones. The current query runs in about 10 seconds. It selects a particular ETF/Index ("index_of_ref"), and look for added/deleted components ("ticker") between consecutive dates ("trade_date"):
SELECT
current_day.trade_date,
current_day.ticker,
CASE
WHEN previous_day.ticker IS NULL THEN '+' -- Ticker added
WHEN current_day.ticker IS NULL THEN '-' -- Ticker deleted
END AS change_type
FROM
t_etf_holdings AS current_day
LEFT JOIN
t_etf_holdings AS previous_day
ON current_day.ticker = previous_day.ticker
AND current_day.index_of_ref = previous_day.index_of_ref
AND previous_day.trade_date = (
SELECT MAX(trade_date)
FROM t_etf_holdings
WHERE trade_date < current_day.trade_date
AND index_of_ref = 'ftse100'
-- AND ticker = current_day.ticker
)
WHERE
current_day.index_of_ref = 'ftse100'
AND previous_day.ticker IS NULL
-- AND (
-- (current_day.ticker IS NOT NULL AND previous_day.ticker IS NULL)
-- OR
-- (current_day.ticker IS NULL AND previous_day.ticker IS NOT NULL)
-- )
ORDER BY
current_day.trade_date DESC, current_day.ticker;
I believe my LEFT JOIN table logic is wrong and prevents to ever get the deleted records. Any help would be apreciated. I am looking to get any tickers OUT of the INNER JOIN. FULL OUTER JOIN, didn't work the way I used it...
I made a working example following advice of @bnk here: fiddle with inserted data results is:
trade_date ticker change_type
2024-01-05 GLEN +
2024-01-04 AAF +
2024-01-04 AAL +
2024-01-04 WPP +
I would like:
trade_date ticker change_type
2024-01-06 GLEN -
2024-01-05 GLEN +
2024-01-05 WPP -
2024-01-04 AAF +
2024-01-04 AAL +
2024-01-04 WPP +
Here is how to reproduce on psql12:
CREATE stmt
CREATE TABLE IF NOT EXISTS public.t_etf_holdings
(
trade_date date NOT NULL,
index_of_ref character varying(25) COLLATE pg_catalog."default" NOT NULL,
ticker character varying(25) COLLATE pg_catalog."default" NOT NULL,
CONSTRAINT t_etf_holdings_pkey PRIMARY KEY (trade_date, index_of_ref, ticker)
);
INSERT stmt
INSERT INTO t_etf_holdings (trade_date, index_of_ref, ticker)
VALUES('2024-01-06', 'ftse100', 'AAF');
INSERT INTO t_etf_holdings (trade_date, index_of_ref, ticker)
VALUES('2024-01-06', 'ftse100', 'AAL');
INSERT INTO t_etf_holdings (trade_date, index_of_ref, ticker)
VALUES('2024-01-05', 'ftse100', 'AAF');
INSERT INTO t_etf_holdings (trade_date, index_of_ref, ticker)
VALUES('2024-01-05', 'ftse100', 'AAL');
INSERT INTO t_etf_holdings (trade_date, index_of_ref, ticker)
VALUES('2024-01-05', 'ftse100', 'GLEN');
INSERT INTO t_etf_holdings (trade_date, index_of_ref, ticker)
VALUES('2024-01-04', 'ftse100', 'AAF');
INSERT INTO t_etf_holdings (trade_date, index_of_ref, ticker)
VALUES('2024-01-04', 'ftse100', 'AAL');
INSERT INTO t_etf_holdings (trade_date, index_of_ref, ticker)
VALUES('2024-01-04', 'ftse100', 'WPP');
You have to use FULL OUTER JOIN and conditional logic to identify and mark tickers as 'added' or 'deleted' between consecutive trade dates in an ETF/Index dataset.
WITH PreviousDay AS (
SELECT trade_date, index_of_ref, ticker
FROM t_etf_holdings
WHERE index_of_ref = 'ftse100'
), CurrentDay AS (
SELECT trade_date, index_of_ref, ticker
FROM t_etf_holdings
WHERE index_of_ref = 'ftse100'
)
SELECT
COALESCE(cd.trade_date, pd.trade_date) AS trade_date,
COALESCE(cd.ticker, pd.ticker) AS ticker,
CASE
WHEN pd.ticker IS NULL THEN '+'
WHEN cd.ticker IS NULL THEN '-'
END AS change_type
FROM CurrentDay cd
FULL OUTER JOIN PreviousDay pd
ON cd.ticker = pd.ticker
AND cd.index_of_ref = pd.index_of_ref
AND pd.trade_date = (
SELECT MAX(trade_date)
FROM t_etf_holdings
WHERE trade_date < cd.trade_date
AND index_of_ref = 'ftse100'
)
WHERE cd.trade_date IS NULL OR pd.trade_date IS NULL
ORDER BY trade_date DESC, ticker;
The result would be
trade_date ticker change_type
2024-01-06 AAF -
2024-01-06 AAL -
2024-01-05 GLEN +
2024-01-05 GLEN -
2024-01-04 AAF +
2024-01-04 AAL +
2024-01-04 WPP +
2024-01-04 WPP -