Users
can have many Articles
.
I want to get 20 of the most recent users and for each user, 5 articles.
I was reading on http://wiki.postgresql.org/wiki/Find_recent_activity but this seems overly complicated for my scenario. However, the query mentioned in the article is stunningly fast! So I've been told. Perhaps there is a way to incorporate some of the methodologies used in the query?
I am using Postgres 9.2
Assuming that "most recent" user means "most recently created":
First, generate some dummy data. I haven't bothered prettily formatting this:
create table users ( id serial primary key, username text not null,created_at timestamptz not null default current_timestamp );
create table articles (id serial primary key, user_id integer not null references users(id), created_at timestamptz not null default current_timestamp);
insert into users (username) values ('todd'),('bob'),('janet'),('joan'),('jim'),('jolly'),('roger'),('yarr'),('fred');
update users set created_at = current_timestamp + id * INTERVAL '1' MINUTE;
insert into articles(user_id, created_at) select u.id, x from users u cross join generate_series(current_timestamp, current_timestamp + INTERVAL '1' HOUR, INTERVAL '1' MINUTE) x;
LATERAL
Now, this is one of those times where it's a pity you're not on 9.3; there you could use a lateral subquery to do all this rather nicely:
SELECT u.username, a.id AS article_id
FROM (
SELECT u1.id, u1.username
FROM users u1
ORDER BY u1.created_at DESC LIMIT 5
) u,
LATERAL (
SELECT a1.id
FROM articles a1
WHERE a1.user_id = u.id
ORDER BY a1.created_at DESC LIMIT 5
) a;
See: http://www.depesz.com/2012/08/19/waiting-for-9-3-implement-sql-standard-lateral-subqueries/
However, since 9.3 isn't quite released yet, it's not surprising you aren't using it.
With 9.2 and older you have to use another layer of subqueries to work around the lack of LATERAL
support you're in for some ugly workarounds involving the row_number
window function and nested subqueries. See Grouped LIMIT in PostgreSQL: show the first N rows for each group?, http://www.postgresql.org/message-id/4CD0B077.2080700@ateb.com, http://troels.arvin.dk/db/rdbms/#select-top-n
Something like:
WITH
last_five_users AS (
SELECT u1.id, u1.username FROM users u1 ORDER BY u1.created_at DESC LIMIT 5
)
SELECT
lfa.username, lfa.article_id
FROM
(
SELECT lfive.username, lfive.id, a.id, row_number() OVER (PARTITION BY a.user_id ORDER BY created_at)
FROM articles a
INNER JOIN last_five_users lfive ON (a.user_id = lfive.id)
) AS lfa(username, user_id, article_id, rownum)
WHERE lfa.rownum <= 10;
(in this case for five users and 10 articles per user).
Efficiency will be awful if the users selected have lots of articles, because it will fetch and number all articles for those users, not just the first n
, then discard most of them in the outer query.
If that's a problem you could create an SQL set returning function:
CREATE OR REPLACE FUNCTION last_n_articles_for_user(user_id integer, n_articles integer)
RETURNS SETOF articles AS $$
SELECT * FROM articles WHERE articles.user_id = $1 ORDER BY created_at LIMIT $2
$$ LANGUAGE sql STABLE;
then use that in your main query:
SELECT
o.username,
(o.last_articles).*
FROM (
SELECT
u.username,
last_n_articles_for_user(u.id, 10) AS last_articles
FROM (
SELECT *
FROM users u1
ORDER BY u1.created_at DESC
LIMIT 5
) u
) AS o;
This is likely to perform a great deal better where there's an index on created_at
, though not as well as the LATERAL
approach enabled in 9.3. The extra layer of subquery is required because invoking last_n_articles_for_user
directly with the .*
syntax will - due to an internal PostgreSQL limitation on row-returning functions - cause it to be invoked once for each column.