I am trying to write an Athena query that will create pre and post data that can be used for analysis after a marketing e-mail went out.
After the code is ran, I am getting the following results because I have no idea how to tell Athena that if a member doesn't have a transaction during a particular time period, add that row in anyway with their age and just put zero for "net_amount" and "total_transactions":
customer_ID | age | net_amount | total_transactions | Period |
---|---|---|---|---|
ABCDRF | 25 | 85.0 | 2 | Post |
HDWSAZ | 37 | 23.98 | 1 | Pre |
HDWSAZ | 37 | 56.34 | 2 | Post |
OAYDI | 42 | 8.2 | 1 | Pre |
OAYDI | 42 | 98.29 | 4 | Post |
IUSTRA | 31 | 88.62 | 1 | Pre |
MAHRYS | 56 | 34.16 | 2 | Pre |
As you can see, this table above does not have a pre period row for "ABCDRF" and does not have a post period row for either "IUSTRA" and "MAHRYS" but I want to add that in with zeros for the "net_amount" and "total_transactions" and the age inserted with the right value for the Period that is missing (pre or post).
The code I am using is:
select distinct(customer_ID) as customer_ID, age, sum(net_amount) as net_amount, sum(lineitem) as total_visits, (case
when trans_date >= (current_timestamp - interval '18' month) AND trans_date < (current_timestamp - interval '6' month) then 'Post'
when trans_date >= (current_timestamp - interval '30' month) AND trans_date < (current_timestamp - interval '18' month) then 'Pre'
else 'nope'
end) as Period
from trans_history
where trans_date >= (current_timestamp - interval '30' month) AND trans_date < (current_timestamp - interval '6' month)
group by 1, 5
order by 1 desc;
What I want:
customer_ID | age | net_amount | total_transactions | Period |
---|---|---|---|---|
ABCDRF | 25 | 0 | 0 | Pre |
ABCDRF | 25 | 85.0 | 2 | Post |
HDWSAZ | 37 | 23.98 | 1 | Pre |
HDWSAZ | 37 | 56.34 | 2 | Post |
OAYDI | 42 | 8.2 | 1 | Pre |
OAYDI | 42 | 98.29 | 4 | Post |
IUSTRA | 31 | 88.62 | 1 | Pre |
IUSTRA | 31 | 0 | 0 | Post |
MAHRYS | 56 | 34.16 | 2 | Pre |
MAHRYS | 56 | 0 | 0 | Post |
Any idea how I can get these rows added in with this code in Athena? I feel like it is just something I need to add in the "case when" statement but I can't figure it out.
Thanks! Mark
The case
expression is not sufficient here, since some customers do not have data for both periods.
Instead, one approach is to cross join
the list of customers with the fixed, predefined periods in order to generate all possible combinations. Then, we can bring the table with a left join
, then aggregate:
select c.customer_id, c.age, p.period
coalesce(sum(net_amount), 0) as net_amount,
coalesce(sum(lineitem) , 0) as total_visits
from (select distinct customer_id, age from trans_history) c
cross join (
select 'Post' as period
current_timestamp - interval '18' month as start_date,
current_timestamp - interval '6' month as end_date
union all
select 'Pre',
current_timestamp - interval '30' month,
current_timestamp - interval '18' month
) p
left join trans_history t
on t.customer_id = c.customer_id
and t.trans_date >= p.start_date
and t.trans_date < p.end_date
group by c.customer_id, c.age, p.period