Search code examples
sqlteradatagaps-and-islandsansi-sql

Excluding same day drop\adds while preserving the real start and end date


We have a table with status updates for subscriptions to a product. A record is inserted into the table when the subscription begins, and that record is updated with an end date when the subscription ends. One of our systems (no idea which one) sometimes does a "same day drop\add" where it ends the subscription and then begins it again (creating a new record). So the same subscriber ID is attached to multiple records, even though nothing really changed.

Example data would be this:

recID subID   start           end        prodtype
1     19    01/11/2001  01/15/2001    A
2     19    01/15/2001  01/16/2001    A
3     19    01/16/2001  01/20/2001    A
4     19    01/30/2001  01/31/2001    A

This guy started on 1/11 and ended on 1/20. Records 2 and 3 were put in by the system (same day drop add, but weren't really). Record 4 is another subscription Mr. 19 started later.

I have some code that will attempt to resolve only the first (the real) record of each distinct subscription, but it can't find the real end date without using max() and grouping by the subscriber. That of course would show two subscriptions, 1/11 - 1/31 and 1/30 - 1/31, which is wrong.

I'm tearing my hair out trying to resolve this pattern down to two records like this:

subID   start           end        prodtype
 19    01/11/2001   01/20/2001    A
 19    01/30/2001   01/31/2001    A

This is in Teradata, but its just ANSI SQL, I believe.


Solution

  • I believe this is ANSI SQL, but I've only tested it on SQL Server.

    Basically, the query is able to find true start dates and true end dates independently of each other. Then to associate the start date and end dates, associates start dates with end dates that are greater than the start date... and then shows the smallest end date.

    SELECT
        startDates.subId,
        startDates.startDate,
        MIN(endDates.endDate) AS endDate,
        startDates.prodType
    FROM
    (
        SELECT
            recID, subID, startDate, prodType
        FROM yourTable s1
        WHERE NOT EXISTS (
            SELECT 1
            FROM yourTable s2
            WHERE 
                s1.startDate = s2.endDate
                AND s1.subId = s2.subId
        )
    ) startDates JOIN
    (
        SELECT
            recID, subID, endDate, prodType
        FROM yourTable s1
        WHERE NOT EXISTS (
            SELECT 1
            FROM yourTable s2
            WHERE 
                s1.endDate = s2.startDate
                AND s1.subId = s2.subId
        )
    ) endDates ON
        startDates.subID = endDates.subID 
        AND startDates.startDate < endDates.endDate
    GROUP BY
        startDates.subId,
        startDates.startDate,
        startDates.prodType
    

    Here is the query in action...