Search code examples
sqldata-modelingdbtdata-vault

Add Valid_From & Valid_To Columns to Table w/ only load_timestamp - where relationships can revert back


I have the table below and I want to write a query that add's two columns:

Valid From = Indicating when a relationship between account & opp started

Valid To = Indicating when a relationship between account & opp ended

**NOTE THAT THE INSERT TIMESTAMP IS UPDATED IF ANY FIELD CHANGES IE COLOR NOT JUST IF A RELATIONSHIP CHANGES **

Query to create table in http://sqlfiddle.com/#!9/0bfae1/1

CREATE TABLE accounts (
  account_id VARCHAR(100),
  load_timestamp TIMESTAMP,
  opportunity_id VARCHAR(100),
  COLOR VARCHAR(100)
);

    INSERT INTO accounts VALUES
    ('1', '2023-03-18','a', 'red'),
    ('1', '2023-03-19','a', 'blue'),
    ('2', '2023-03-19','b', 'blue'),
    ('3', '2023-03-20', 'c', 'blue'),
    ('4', '2023-03-21', 'd', 'blue'),
    ('5', '2023-03-22', 'e', 'blue'),
    ('1', '2023-03-23', 'f', 'blue'),
    ('1', '2023-03-24', 'a', 'blue'),
    ('3', '2023-03-24', 'z', 'blue')

I am trying to use the subquery below, but it does not handle the circumstance of a relationship changing and then changing back.

WITH es_account as (
    SELECT
        t1.*,
        t1.load_timestamp AS effective_from,
        LEAD(t1.load_timestamp) OVER (
            PARTITION BY t1.opportunity_id
            ORDER BY
                t1.load_timestamp
        ) AS effective_to
    FROM
        account t1
)
SELECT

*
    
FROM
    es_account;


The desired output can be viewed in sqlfiddle.com with the code below
CREATE TABLE es_accounts (
  account_id VARCHAR(100),
  load_timestamp TIMESTAMP,
  opportunity_id VARCHAR(100),
  effective_from TIMESTAMP,
  effective_to TIMESTAMP
);

INSERT INTO es_accounts VALUES
('1', '2023-03-18','a', '2023-03-18', '2023-03-22' ),
('2', '2023-03-19','b', '2023-03-19', NULL),
('3', '2023-03-20', 'c', '2023-03-20', '2023-03-24'),
('4', '2023-03-21', 'd', '2023-03-21', NULL),
('5', '2023-03-22', 'e', '2023-03-22', NULL),
('1', '2023-03-23', 'f', '2023-03-22', '2023-03-23'),
('1', '2023-03-24', 'a', '2023-03-24', NULL),
('3', '2023-03-24', 'z', '2023-03-24', NULL)

Solution

  • You can use below query to find the desired result if you are using snowflake.

    SELECT  *, CASE WHEN LAG(LOAD_TIMESTAMP) OVER( PARTITION BY  account_id 
    ORDER BY LOAD_TIMESTAMP) IS NULL 
    THEN  LOAD_TIMESTAMP ELSE LOAD_TIMESTAMP END  AS FROM_DATE,
    IFNULL(DATEADD(d,-1, LEAD(LOAD_TIMESTAMP) OVER( PARTITION BY  account_id 
    ORDER BY LOAD_TIMESTAMP )), NULL ) AS TO_DATE
    FROM accounts SRC
    QUALIFY IFNULL(LAG(src.opportunity_id) OVER (PARTITION BY src.account_id  
    ORDER BY SRC.LOAD_TIMESTAMP),'')<> src.opportunity_id
    order by load_timestamp