Using Rails, I'm trying to perform a SQL command to return an array of rows that contain the highest value for a certain user per day.
For example:
user_id(integer) | created_at(datetime) | score(integer)
-------------------+--------------------------------+---------------
1 | "2015-07-27 21:35:24" | 100
1 | "2015-07-27 21:35:24" | 123
2 | "2015-07-27 21:35:24" | 101
2 | "2015-07-27 21:35:24" | 122
3 | "2015-07-27 21:35:24" | 103
3 | "2015-07-27 21:35:24" | 115
1 | "2015-07-26 21:35:24" | 116
1 | "2015-07-26 21:35:24" | 151
2 | "2015-07-26 21:35:24" | 122
2 | "2015-07-26 21:35:24" | 134
3 | "2015-07-26 21:35:24" | 123
3 | "2015-07-26 21:35:24" | 111
1 | "2015-07-25 21:35:24" | 129
1 | "2015-07-25 21:35:24" | 152
2 | "2015-07-25 21:35:24" | 120
2 | "2015-07-25 21:35:24" | 109
3 | "2015-07-25 21:35:24" | 142
3 | "2015-07-25 21:35:24" | 131
Expected results:
user_id(integer) | created_at(datetime) | score(integer)
-------------------+--------------------------------+---------------
1 | "2015-07-27 21:35:24" | 123
2 | "2015-07-27 21:35:24" | 122
3 | "2015-07-27 21:35:24" | 115
1 | "2015-07-26 21:35:24" | 151
2 | "2015-07-26 21:35:24" | 134
3 | "2015-07-26 21:35:24" | 123
1 | "2015-07-25 21:35:24" | 152
2 | "2015-07-25 21:35:24" | 120
3 | "2015-07-25 21:35:24" | 142
I've been combining various joins, having
, and other methods, but to no avail. I can't get it to filter the results. I've made a little bit of progress by select
ing max values per day, but then I can't filter out the lower ones on a per user_id
basis. I managed to do it using multiple group_by
and map
in Rails, but it's super slow since it has to re-iterate over the entire array, and since there is a lot of records, it can take some time.
EDIT:
My solution was as follows:
all_scores_in_time_period = UserScore
.where("EXTRACT(MONTH FROM created_at) = ?", Date::MONTHNAMES.index(params[:month_control]))
.where("EXTRACT(YEAR FROM created_at) = ?", params[:year_control])
.select("DISTINCT ON(DATE(created_at), user_id) *")
.order("DATE(created_at) desc")
This first filters by month/year, and then returns a list of users by their top score per user per day.
You can use distinct on ()
which is a Postgres extension to the distinct
operator:
select distinct on (user_id, created_at) user_id, created_at, score
from the_table
order by user_id, created_at, score desc;
If you want solution using standard SQL, this can also be achieved using window functions:
select user_id, created_at, score
from (
select user_id, created_at, score,
row_number() over (partition by user_id, created_at order by score desc) as rn
from the_table
) as t
order by user_id, created_at;
The solution with distinct on()
is typically faster in Postgres.
With a window function you could also handle ties: when a user has the same (highest) score more than once on a day. The solution with row_number()
will only return a single row per (user_id, created_at). If you want all rows with the same (highest) score, you need to use dense_rank()
instead.
Edit
If you want to ignore the time part of the timestamp column, just cast it to a date:
created_at::date