Search code examples
sqlamazon-athenaprestotrino

SQL How do I generate a row with value 0 of customers in days without order


I would like to create a query that takes all the dates of the year 2023 (in this case I'm using the order table as an auxiliary table, which is for order registration to get the dates, it always has a date for every day) and then give left join with all client ids how many orders you have each day and if you don't have any orders that day, return 0. I put the client_id 10552 as an example, but there would be several others.

I'm using only one table, the order, which has the orders placed, where I'm going to use it to get the dates of the year and also the order quantity by client_id.

Example of some information that comes from the order table when I do a select * from order:

created_at order_status_id client_id
2018-04-05 22:19:29.000 3 10,282
2018-04-22 23:47:36.000 3 2
2018-04-25 23:42:22.000 3 11
2018-04-06 19:54:33.000 4 2
2018-04-09 12:11:06.000 3 2

I'm trying to use the following query to do what I want:

with Datas as (
select distinct 
   CAST(o.created_at AS DATE) datas
from platform_bs."order" o
where date_format(o.created_at, '%Y') = '2023'
order by 1
),-- here I put it to list all the dates from 2023 until now
count_orders_client as(
SELECT
   datas,
   od.client_id,
   nullif(count(od.id),0) as count
from Datas as dt
left join platform_bs."order" od ON dt.datas = CAST(od.created_at AS DATE)
group by 1,2
order by 1,2
)-- here I do a left join to get the dates, client id and a count and where should I do it if there were no records on the day, it would return 0
select * from count_orders_client where client_id = 10552

I would like it to return this:

datas client_id count
2023-01-01 10552 1
2023-01-02 10552 1
2023-01-03 10552 0
2023-01-04 10552 0
2023-01-05 10552 0
2023-01-06 10552 0
2023-01-07 10,552 1
2023-01-08 10,552 2
2023-01-09 10552 0
2023-01-10 10552 0
2023-01-11 10552 0
2023-01-12 10552 0

But when run it returns like this:

datas client_id count
2023-01-01 10,552 1
2023-01-02 10,552 1
2023-01-07 10,552 1
2023-01-08 10,552 2
2023-01-13 10,552 1
2023-01-23 10,552 1
2023-02-27 10,552 1

I'm not getting the part of the left join joining the dates of the year 2023 with the order table to return the values ​​0. What should I do?

Edit1: AWS Athena is being used


Solution

  • Usually to generate date ranges (up to 10000 elements) I use sequence (it makes everything a bit clear and manageable), so you can try the following:

    -- sample data
    with dataset(created_at, order_status_id, client_id) as (
        values (timestamp '2023-04-05 22:19:29.000',    3,  '11'),
            (timestamp '2023-04-06 23:47:36.000',   3,  '2'),
            (timestamp '2023-04-07 23:42:22.000',   3,  '11'),
            (timestamp '2023-04-06 19:54:33.000',   4,  '2'),
            (timestamp '2023-04-08 12:11:06.000',   3,  '2')
    ),
    -- query parts
    clients as ( -- find all unique clients
        select distinct client_id
        from dataset
    ),
    client_dates as ( -- create cartesian product of clients and dates
        select *
        from clients,
        -- use here needed start and end for `sequence`:
        unnest(sequence(date '2023-04-04', date '2023-04-09', interval '1' day)) as t(dt) 
    
    )
    
    select cd.client_id, dt, count(order_status_id) count
    from client_dates cd
    left join dataset d
        on cd.dt = date(created_at) and cd.client_id = d.client_id
    group by cd.client_id, cd.dt;
    

    Output:

    client_id dt count
    2 2023-04-04 0
    2 2023-04-05 0
    2 2023-04-06 2
    2 2023-04-07 0
    2 2023-04-08 1
    2 2023-04-09 0
    11 2023-04-04 0
    11 2023-04-05 1
    11 2023-04-06 0
    11 2023-04-07 1
    11 2023-04-08 0
    11 2023-04-09 0