Search code examples
sqlstatisticssnowflake-cloud-data-platformanalysis

Customer retention rate in snowflake


I have this formula for the customer retention rate but kinda stuck in translating it into a sql code in snowflake:

Customer Retention rate: number of customers who purchased in the past AND in the period of [last 30 days] / number of customers who have purchased in the past [not including last 30 days]


Solution

  • if we check for each customer if they have "old sales" and "new sales"

    select customer_id, min(date) as min_date, max(date) as max_date, min_date < (current_date()-30) as old_sales, max_date >= (current_date()-30) as new_sales
    from values
        (1,'2022-05-01'),
        (2,'2022-05-01'),
        (2,'2022-04-01'),
        (3,'2022-04-01'),
        (4,'2022-04-01'),
        (4,'2022-03-01')
        t(customer_id, date)
    group by 1
    

    we get:

    CUSTOMER_ID MIN_DATE MAX_DATE OLD_SALES NEW_SALES
    1 2022-05-01 2022-05-01 FALSE TRUE
    2 2022-04-01 2022-05-01 TRUE TRUE
    3 2022-04-01 2022-04-01 TRUE FALSE
    4 2022-03-01 2022-04-01 TRUE FALSE

    we then want to count both as true for the top, and trop/false for the bottom, we can use:

    select count_if(old_sales and new_sales) as top
        ,count_if(old_sales and  new_sales=false) as bottom
        ,div0(top, bottom) as rention
    from (
        select customer_id, min(date) as min_date, max(date) as max_date, min_date < (current_date()-30) as old_sales, max_date >= (current_date()-30) as new_sales
        from values
            (1,'2022-05-01'),
            (2,'2022-05-01'),
            (2,'2022-04-01'),
            (3,'2022-04-01'),
            (4,'2022-04-01'),
            (4,'2022-03-01')
            t(customer_id, date)
        group by 1
    );
    

    and get:

    TOP BOTTOM RENTION
    1 2 0.5

    thus with a data_table as our source we can use:

    select div0(count_if(old_sales and new_sales), count_if(old_sales and  new_sales=false)) as rention
    from (
        select 
            min(date) < (current_date()-30) as old_sales, 
            max(date) >= (current_date()-30) as new_sales
        from data_table
        group by customer_id
    );
    

    to get the magic 50%