Search code examples
sqlsql-servert-sqlcount

Getting a count dynamically for Customers first time orders and also orders lost after last order date + 4 months, per year and month


I am trying to dynamically get a count at a per month basis per customer and type. Where two of the counts are the first time we got an order based on the program and the other two are programs/customers we lost based on their last order date and 4 or 5 months have passed since.

The new customers per month is correct but the lost customers is not in both queries I have created.

There is a few issues with the data, one the invoicing data that I am using doesn't have a unique identifier so to create one I am using concat with First name, Last name, DOB and Type of program.

I.E FirstLast2024-09-12Type

Second issue I cant get around is the count for lost customers, I have been trying multiple ways and landed on the below two queries and have got different results for lost customers with each query. The first orders in my data set is 2021-01 so there should be no lost customers for those first months as no ones last order could be before 2021-01. I am hoping it is something small but even after taking a few days break from it I can't wrap my head around it.. First Query:

with dates as (
      select convert(date, '2019-01-01') as dt
      union all
      select dateadd(month, 1, dt)
      from dates
      where dt < getdate()
     )
Select  left(dt,7) as [date],count(*) as [new_customers], sum(case when right(b.[id],3) = 'IPP' and left(b.[1st_order],7) = left(dt,7) then 1 else 0 end) as [new_IPP_customers], 
sum(case when right(b.[id],3) = 'MAP' and left(b.[1st_order],7) = left(dt,7) then 1 else 0 end) as [new_MAP_customers],
sum(case when right(b.[id],3) = 'IPP' and datediff(m,dt,b.[Last]) > 1 then 1 else 0 end) as [Lost_IPP_customers],
sum(case when right(b.[id],3) = 'MAP' and datediff(m,dt,b.[Last]) > 1 then 1 else 0 end) as [Lost_MAP_customers]
from dates
Left Join(
Select CONCAT(a.[First Name],a.[Last Name],a.[DOB],a.Solution) as [id], min(dos) as [1st_order],max(llast) as [last]
FROM [MYTABLE] a
Left Join(
Select CONCAT([First Name],[Last Name],[DOB],Solution) as [id], dateadd(m,4,max(dos)) as [llast]
FROM [MYTABLE] a
Group by CONCAT([First Name],[Last Name],[DOB],[Solution])) b on b.id = CONCAT(a.[First Name],a.[Last Name],a.[DOB],a.Solution)
Group by CONCAT(a.[First Name],a.[Last Name],a.[DOB],a.Solution)) b on left(b.[1st_order],7) = left(dt,7)
where b.id is not null
Group by left(dt,7)
order by left(dt,7)

Results:

1st query results

Second query: just tried different ways to join etc..

with dates as (
      select convert(date, '2019-01-01') as dt
      union all
      select dateadd(month, 1, dt)
      from dates
      where dt < getdate()
     )
Select  left(dt,7) as [date],count(*) as [new_customers], sum(case when right(b.[id],3) = 'IPP' and left(b.[1st_order],7) = left(dt,7) then 1 else 0 end) as [new_IPP_customers], 
sum(case when right(b.[id],3) = 'MAP' and left(b.[1st_order],7) = left(dt,7) then 1 else 0 end) as [new_MAP_customers],

sum(case when right(c.[id],3) = 'IPP' and datediff(m,dt,c.[Last_order]) > 5 then 1 else 0 end) as [Lost_IPP_customers],
sum(case when right(c.[id],3) = 'MAP' and datediff(m,dt,c.[Last_order]) > 5 then 1 else 0 end) as [Lost_MAP_customers]
from dates
Left Join(
Select a.[id],  min(dos) as [1st_order]
From
(Select *,CONCAT([First Name],[Last Name],[DOB],Solution) as [id]
FROM [MYTABLE) a
Group by a.[id]) b on left(b.[1st_order],7) = left(dt,7) 
Left Join(
Select a.[id],  max(dos) [Last_order]
From
(Select *,CONCAT([First Name],[Last Name],[DOB],Solution) as [id]
FROM [MYTABLE]) a
Group by a.[id]) c on b.[id] = c.id 
where c.id is not null
Group by left(dt,7)
order by left(dt,7)

Results: 2nd Query Results

Note: Count * was just to make sure the new customers added up for visual purposes.

Here is example of how the data set looks before joining to a calendar table and sum cases: First and Last orders

The new customers per month is correct but the lost customers is not in both queries I have created.

EDIT: I failed to notice the first query sum counts for first and lost are the same count as each other, the second query is what i started with and I tried to make it simpler and created the first query as the second query wasn't populating the lost customers correctly. Here is example data from "mytable" I blocked off parts of the data for privacy.. I also added two tables to the post of example data and What I am trying to accomplish with that data provided.

MYtable_Data

Example data:

First Name Last Name DOB Member Number DOS Medication Solution Day Supply Insurance Amount Due Payment Status Invoice # Inv# Date
Name Fake 1968-01-11 NULL 2023-10-26 Ocrevus MAP 180 0.00 9890.40 PAID 111 2023-11-09
Name Fake 1968-01-11 NULL 2022-10-31 Ocrevus MAP 180 0.00 8723.9825 PAID 222 2022-11-18
Name Fake 1968-01-11 NULL 2023-04-29 Ocrevus MAP 180 0.00 9890.40 PAID 333 2023-05-12
Name1 Fake1 1998-03-22 NULL 2022-06-02 Novolog Flextouch IPP 90 0.00 541.08 PAID 123456 2022-06-22
Name1 Fake1 1998-03-22 NULL 2021-06-04 Novolog IPP 90 0.00 541.62 PAID 123 2021-07-20
Name1 Fake1 1998-03-22 NULL 2021-06-04 Basaglar IPP 90 0.00 551.76 PAID 123 2021-07-20
Name1 Fake1 1998-03-22 NULL 2022-03-08 Lantus Solostar Pens IPP 90 0.00 665.97 PAID 12368 2022-04-20
Name2 Fake2 1982-11-16 NULL 2022-09-29 Creon MAP 90 0.00 1235.07 PAID 3698 2022-11-04
Name2 Fake2 1982-11-16 NULL 2023-10-10 Creon IPP 100 0.00 1874.56 PAID 516 2023-10-27
Name2 Fake2 1982-11-16 NULL 2023-06-26 Creon MAP 90 0.00 1238.00 PAID 44 2023-08-07

Using the example data above the output should look like below with the lost customer count being 4 months after the DOS (but the real results would have every year and month in order from x number of years ago to today's month, I shortened the results for viewing purposes.

date new_IPP_customers new_MAP_customers Lost_IPP_customers Lost_MAP_customers
2021-06 1 0 0 0
2022-09 0 1 0 0
2022-10 0 1 0 1
2023-10 1 0 0 1
2024-02 0 0 1 1

Solution

  • Something like this can be a good start perhaps:

    --Data
    SELECT  *
    INTO #data
    FROM    (
        VALUES  (N'Name', N'Fake', N'1968-01-11', NULL, N'2023-10-26', N'Ocrevus', N'MAP', 180, 0.00, 9890.40, N'PAID', 111, N'2023-11-09')
        ,   (N'Name', N'Fake', N'1968-01-11', NULL, N'2022-10-31', N'Ocrevus', N'MAP', 180, 0.00, 8723.9825, N'PAID', 222, N'2022-11-18')
        ,   (N'Name', N'Fake', N'1968-01-11', NULL, N'2023-04-29', N'Ocrevus', N'MAP', 180, 0.00, 9890.40, N'PAID', 333, N'2023-05-12')
        ,   (N'Name1', N'Fake1', N'1998-03-22', NULL, N'2022-06-02', N'Novolog Flextouch', N'IPP', 90, 0.00, 541.08, N'PAID', 123456, N'2022-06-22')
        ,   (N'Name1', N'Fake1', N'1998-03-22', NULL, N'2021-06-04', N'Novolog', N'IPP', 90, 0.00, 541.62, N'PAID', 123, N'2021-07-20')
        ,   (N'Name1', N'Fake1', N'1998-03-22', NULL, N'2021-06-04', N'Basaglar', N'IPP', 90, 0.00, 551.76, N'PAID', 123, N'2021-07-20')
        ,   (N'Name1', N'Fake1', N'1998-03-22', NULL, N'2022-03-08', N'Lantus Solostar Pens', N'IPP', 90, 0.00, 665.97, N'PAID', 12368, N'2022-04-20')
        ,   (N'Name2', N'Fake2', N'1982-11-16', NULL, N'2022-09-29', N'Creon', N'MAP', 90, 0.00, 1235.07, N'PAID', 3698, N'2022-11-04')
        ,   (N'Name2', N'Fake2', N'1982-11-16', NULL, N'2023-10-10', N'Creon', N'IPP', 100, 0.00, 1874.56, N'PAID', 516, N'2023-10-27')
        ,   (N'Name2', N'Fake2', N'1982-11-16', NULL, N'2023-06-26', N'Creon', N'MAP', 90, 0.00, 1238.00, N'PAID', 44, N'2023-08-07')
    ) t ([First Name],[Last Name],DOB,[Member Number],DOS,Medication,Solution,[Day Supply],Insurance,[Amount Due],[Payment Status],[Invoice #],[Inv# Date])
    
    -- Code
    ;WITH dates AS (
          SELECT    CONVERT(date, '2019-01-01') AS dt
          UNION ALL
          SELECT    dateadd(month, 1, dt)
          FROM  dates
          WHERE dt < getdate()
         )
    SELECT  CONVERT(VARCHAR(6), dt, 112)
    ,   ISNULL(SUM(CASE WHEN solution = 'IPP' THEN x.gained END), 0) AS IPPGained
    ,   ISNULL(SUM(CASE WHEN solution = 'IPP' THEN x.lost END), 0) AS IPPLost
    ,   ISNULL(SUM(CASE WHEN solution = 'MAP' THEN x.gained END), 0) AS MAPGained
    ,   ISNULL(SUM(CASE WHEN solution = 'MAP' THEN x.lost END), 0) AS MAPLost
    FROM    dates d
    LEFT JOIN 
    (
        SELECT  sa.*
        FROM    (
            SELECT  CASE WHEN lag(dos) OVER(partition BY CONCAT([First Name], [Last Name], DOB, Solution) ORDER BY DOS) IS NULL THEN 1 ELSE 0 END AS first
            ,   CASE WHEN LEAD(dos) OVER(partition BY CONCAT([First Name], [Last Name], DOB, Solution) ORDER BY DOS) IS NULL THEN 1 ELSE 0 END AS last
            ,   *
            FROM    #data d
        ) d
        CROSS apply (
            SELECT  solution, first AS gained, 0 AS lost, DOS
            WHERE   first = 1
        UNION ALL
            SELECT  solution, 0, 1, DATEADD(MONTH, 4, Dos)
            WHERE   last = 1
        ) sa
        ) x
        ON  EOMONTH(x.DOS) = EOMONTH(d.dt)
    GROUP BY CONVERT(VARCHAR(6), dt, 112)
    

    The core of the idea is to create two flags, first and last order.

    I use LAG/LEAD window function to do so, and PARTITION BY the identifiers like First, last, Solution to distinguish the different groups of customers.

    Now, to simplify the gain/losses, I create two rows:

    • First row, which has 1 gained and 0 loss, DOS as order date
    • Last row, which has 0 gained, 1 loss and DOS 4 months into the future

    This simplifies our logic later.

    Finally, i join back on the dates CTE and use conditional aggregation (CASE WHEN ...) to generate the tabulated format you're looking for.

    For the date join, i use EOMONTH to simplify the date handling, this function returns the last date of the month ie. a single date. Feel free to use something else, i wasn't going for the most performant solution, but rather to demonstrate main concept.