Suppose the following:
create schema bv;
create table bv.user(id bigint primary key);
create table bv.user_photo (
id bigint primary key,
url varchar(255) not null,
user_id bigint references bv.user(id)
);
insert into bv.user values (100), (101);
insert into bv.user_photo values
(1, 'https://1.com', 100),
(3, 'https://3.com', 100),
(4, 'https://4.com', 101),
(2, 'https://2.com', 100);
I'd like to query for and build an object for every user, and include only the latest image in the result.
Here's what I have:
select
json_build_object(
'id', u.id,
'latest_image', up.url
) user
from bv.user u
left join bv.user_photo up
on u.id = up.user_id
However this returns:
[
{"id" : 100, "url" : "https://2.com"},
{"id" : 100, "url" : "https://3.com"},
{"id" : 100, "url" : "https://1.com"},
{"id" : 101, "url" : "https://4.com"}
]
However, the expected result is:
[
{"id" : 100, "url" : "https://3.com"},
{"id" : 101, "url" : "https://4.com"}
]
I've tried using distinct
:
select distinct on(u.id)
json_build_object(
'id', u.id,
'url', up.url
) user
from bv.user u
left join bv.user_photo up
on u.id = up.user_id
order by u.id, up.id DESC
But my question is whether or not this is the correct approach? I feel like I shouldn't be using distinct
in such a situation.
With few photos per user, and while you return all (or most) users, DISTINCT ON
is the best approach.
But it's typically faster to get distinct photos before you join:
SELECT json_build_object('id', u.id, 'url', p.url) AS "user"
FROM users u
LEFT JOIN (
SELECT DISTINCT ON (user_id)
user_id, url
FROM user_photo
ORDER BY user_id, id DESC
) p ON p.user_id = u.id;
For many photos per user, an emulated index-skip scan is (much) faster. See:
To produce one summary array, you can skip json_build_object()
. json_agg()
can aggregate the row directly:
SELECT json_agg(sub) AS "users"
FROM (
SELECT u.id, p.url
FROM users u
LEFT JOIN (
SELECT DISTINCT ON (user_id)
user_id, url
FROM user_photo
ORDER BY user_id, id DESC
) p ON p.user_id = u.id
) sub;
Notably, all queries so far include the key "url" with a null value where no photo is found. You may want to strip the noise:
SELECT json_strip_nulls(json_agg(sub)) AS "users"
FROM ...
This seems to be your use case.
A LATERAL
subquery is typically faster. Also deals with many photos per user efficiently!
SELECT json_build_object('id', u.id, 'url', p.url) AS "user"
FROM users u
LEFT JOIN LATERAL (
SELECT up.url
FROM user_photo up
WHERE up.user_id = u.id
ORDER BY up.id DESC
LIMIT 1
) p ON true
WHERE u.id IN (100, 101); -- small selection
SELECT json_strip_nulls(json_agg(sub)) AS "users"
FROM (
SELECT u.id, p.url
FROM users u
LEFT JOIN LATERAL (
SELECT up.url
FROM user_photo up
WHERE up.user_id = u.id
ORDER BY up.id DESC
LIMIT 1
) p ON true
WHERE u.id IN (100, 101) -- small selection
) sub;
DISTINCT ON
does not need an index. All other queries absolutely need an index on user_photo(user_id, id)
. Or even, ideally:
CREATE INDEX ON user_photo (user_id, id DESC) INCLUDE (url);
Aside: Don't use the reserved word "user" as identifier. It works while schema-qualified, but fails without.