Search code examples
amazon-redshiftperiscope

How can I make the denominator a constant for each of the numbers in the same row in SQL?


I am trying to create a table with the average amount of sales divided by a cohort of users that signed up for an account in a certain month, however, I can only figure out to divide by the number of people that made a purchase in that specific month which is lower than the total amount of the cohort. How do I change the query below to make each of the avg_sucessful_transacted amounts divide by cohort 0 for each month?

thank you.

select sum (t.amount_in_dollars)/ count (distinct u.id)  as Avg_Successful_Transacted, (datediff(month,[u.created:month],[t.createdon:month])) as Cohort, [u.created:month] as Months,
count (distinct u.id) as Users
from [transaction_cache as t]
left join [user_cache as u] on t.owner = u.id
where t.type = 'savings' and t.status = 'successful' and [u.created:year] > ['2017-01-01':date:year]
group by cohort, months
order by Cohort, Months

Solution

  • You will need to break out the cohort sizing into its own subquery or CTE in order to calculate the total number of distinct users who were created during the month which matches the cohort's basis month.

    I approached this by bucketing users by the month they were created using the date_trunc('Month', <date>, <date>) function, but you may wish to approach it differently based on the specific business logic that generates your cohorts.

    I don't work with Periscope, so the example query below is structured for pure Redshift, but hopefully it is easy to translate the syntax into Periscope's expected format:

    WITH cohort_sizes AS (
        SELECT date_trunc('Month', created)::DATE AS cohort_month
             , COUNT(DISTINCT(id)) AS cohort_size
        FROM user_cache u
        GROUP BY 1
    ),
    
    cohort_transactions AS (
        SELECT date_trunc('Month', created)::DATE AS cohort_month
             , createdon
             , owner
             , type
             , status
             , amount_in_dollars
             , id
             , created
        FROM transaction_cache t
            LEFT JOIN user_cache u ON t.owner = u.id
        WHERE t.type = 'savings'
            AND t.status = 'successful'
            AND u.created > '2017-01-01'
    )
    
    SELECT SUM(t.amount_in_dollars) / s.cohort_size AS Avg_Successful_Transacted
         , (datediff(MONTH, u.created, t.createdon)) AS Cohort
         , u.created                                                 AS Months
         , count(DISTINCT u.id)                                      AS Users
    FROM cohort_transactions t
        JOIN cohort_sizes s ON t.cohort_month = s.cohort_month
        LEFT JOIN user_cache AS u ON t.owner = u.id
    GROUP BY s.cohort_size, Cohort, Months
    ORDER BY Cohort, Months
    ;