Search code examples
sqlsql-serverlag

Track Latest Changes of a Column


Here is the desired outcome: when a sale is created, it is considered a 'Lead', and we have a different LeadStage. Once a Lead is secured, it will be converted to an Opportunity with an alternative stage name. So what I have built, is when a new record has been inserted, it will get the stage name from the previous row:

SELECT [DWKey]
  , [ObjectChangeId]
  , [OriginalSalesLeadId]
  , [OpportunityStage]
  , [LeadStage]
  , CASE WHEN CRMLeadOpportunity IS NOT NULL
  THEN LAG(COALESCE(OpportunityStage, LeadStage), 1, COALESCE(OpportunityStage, LeadStage))
    OVER (PARTITION BY originalSalesLeadId ORDER BY DWkey)
  ELSE NULL END AS PreviousStage
FROM [BoyumDataWarehouse].[dbo].[DimSalesLeadAttributes]
WHERE OriginalSalesLeadId = 20240220

Here is the output:

DWKey OriginalSalesLeadId LeadStage OpportunityStage PreviousStage
107309 20240220 SAL NULL SAL
109442 20240220 NULL Evaluating SAL
111224 20240220 NULL Evaluating Evaluating
111458 20240220 NULL Evaluating Evaluating
111730 20240220 NULL Lost Evaluating
111983 20240220 NULL Lost Lost
113011 20240220 NULL Lost Lost

However, I would like to keep the old Stage in the PreviousStage column instead of stage from Previous Row. So my desired output would be:

DWKey OriginalSalesLeadId LeadStage OpportunityStage PreviousStage
107309 20240220 SAL NULL NULL
109442 20240220 NULL Evaluating SAL
111224 20240220 NULL Evaluating SAL
111458 20240220 NULL Evaluating SAL
111730 20240220 NULL Lost Evaluating
111983 20240220 NULL Lost Evaluating
113011 20240220 NULL Lost Evaluating

I have been trying to find a work-around but no luck, unfortunately. Therefore, I need your help to achieve it.

Here is the DDL:

  CREATE TABLE [dbo].[DimSalesLeadAttributes](
    [DWKey] [int]  NOT NULL,
    [OriginalSalesLeadId] [int] NOT NULL,
    [LeadStage] [nvarchar](100) NULL,
    [OpportunityStage] [nvarchar](100) NULL,
    [PreviousStages] [nvarchar](50) NULL) ON [PRIMARY]

& DML:

INSERT INTO [dbo].[DimSalesLeadAttributes] ([DWKey],[OriginalSalesLeadId],[OpportunityStage],[LeadStage],[PreviousStages])
VALUES(107309,20240220,NULL,'SAL',NULL),
(109442,20240220,'Evaluating',NULL,NULL),
(111224,20240220,'Evaluating',NULL,NULL),
(111458,20240220,'Evaluating',NULL,NULL),
(111730,20240220,'Lost',NULL,NULL),
(111983,20240220,'Lost',NULL,NULL),
(113011,20240220,'Lost',NULL,NULL)

Solution

  • You will have to subselect from a intermediate table (CTE) where you isolated or flagged the steps where the stage changed:

    With SQL Server 2022

    (thanks to @Dale K)

    You can rely on last_value(…) ignore nulls to associate to each row, the last time it transitioned,
    with a twist to make non-changing rows return null:

    with
    -- Optional intermediate table to hold the result of the coalesce, for subsequent clarity:
    stage as (select *, coalesce(OpportunityStage, LeadStage) stage from DimSalesLeadAttributes),
    -- Add the previous stage to each entry:
    prev as
    (
        select
            *,
            lag(stage) over (partition by OriginalSalesLeadId order by DWKey) prev
        from stage
    )
    SELECT [DWKey]
      , [ObjectChangeId]
      , [OriginalSalesLeadId]
      , [OpportunityStage]
      , [LeadStage]
      , CASE WHEN CRMLeadOpportunity IS NOT NULL
      THEN
          LAST_VALUE(CASE WHEN stage = prev THEN NULL ELSE prev END)
          IGNORE NULLS
          OVER (PARTITION BY OriginalSalesLeadId ORDER BY DWKey)
      END AS LastStage
    FROM prev;
    

    With SQL Server < 2022

    You'll sub-SELECT TOP(1) the previous stage from a table where you can filter out rows that did not change the stage, ordered antichronologically: thus you'll fetch the last stage change:

    with
    -- Optional intermediate table to hold the result of the coalesce, for subsequent clarity:
    stage as (select *, coalesce(OpportunityStage, LeadStage) stage from DimSalesLeadAttributes),
    -- Add the previous stage to each entry:
    prev as
    (
        select
            OriginalSalesLeadId, DWKey, stage,
            lag(stage) over (partition by OriginalSalesLeadId order by DWKey) prev
        from stage
    )
    SELECT [DWKey]
      , [ObjectChangeId]
      , [OriginalSalesLeadId]
      , [OpportunityStage]
      , [LeadStage]
      , CASE WHEN CRMLeadOpportunity IS NOT NULL
      THEN
      (
          select top(1) prev
          from prev
          where prev.OriginalSalesLeadId = stage.OriginalSalesLeadId
          and prev.DWKey <= stage.DWKey
          -- And now for the magic: only the row which had a different LastStage:
          and prev.prev <> prev.stage -- No need to include the "or prev is null"s: the select prev would return a null anyway.
          order by prev.DWKey desc
      )
      END AS LastStage
    FROM stage;
    

    See it in an SQL fiddle
    (with an additional stage "Hesitating" to show its handling; and including an alternative answer that simply wraps your query as the one returning a "Last" stage (instead of the "Previous" (= before the last change) one, then lookup for the most recent change on this same, intermediate, table)

    Note that contrary to your expected output, the first row gets no PreviousState:

    DWKey ObjectChangeId OriginalSalesLeadId OpportunityStage LeadStage PreviousStage
    107309 null 20240220 SAL null null
    109442 null 20240220 null Lost SAL
    111224 null 20240220 null Lost SAL
    111458 null 20240220 null Lost SAL
    111730 null 20240220 null Lost SAL
    111983 null 20240220 null Lost SAL
    113011 null 20240220 null Lost SAL