We have a very large number of rows (10's of millions) for a specific customer and I just want to pull 1 row of data for a specific user of the customer per day for a whole month. Here is a sample of the query for a single day.
select update_date, name, city
from mydb.mytable
WHERE year = '2021' and month = '05' and dd = '01'
and customer_id = 'acme' and user_id = 'user123'
limit 1
How do I do this for each day of a month so that I get only 1 row per day for that one customer/user in the most efficient (fastest) manner using PrestDB?
EDIT: I should add that if there is at least 1 row, the single day query returns rather quickly, but if there are no rows for that customer/user for a given day, then it has to look through all the rows to get nothing and that is what takes a long time. I don't think anything but indexing would help solve this, right?
Use row_number()
:
select update_date, name, city
from (select t.*,
row_number() over (partition by user_id, year, month, day order by update_date desc) as seqnum
from mydb.mytable t
where year = '2021' and month = '05' and
user_id = 'user123' and customer_id = 'acme'
) t
where seqnum = 1;