The following is my query:
SELECT *
FROM (
SELECT f.max, f.min, p.user_id, p.id, p.title, p.rating,
RANK() OVER (
PARTITION BY p.user_id
ORDER BY p.rating DESC, p.id DESC
) AS rnk
FROM posts AS p
INNER JOIN friends AS f ON (p.user_id = f.friend_id)
WHERE f.user_id=1
) AS subq
WHERE (subq.rnk <= subq.max)
LIMIT 10
It searches for posts of my friends, sorted by their rating and date. The window function implemented in this query lets me limit the number of rows returned for each friend according to the MAX
field on Friends
table.
However, I also have a field MIN
, which is used to specify the minimum number of posts I want from the query for a given friend. How is that possible?
I also wonder if SQL is the best option for those types of queries? I already tried Neo4j Graph database, and while it seemed as a good solution, I would rather avoid using 2 separate databases.
Schema:
CREATE TABLE friends(
user_id int,
friend_id int,
min int,
max int
);
CREATE TABLE posts(
id int,
title varchar(255),
rating int,
date date,
user_id int
);
Suppose we have the following data:
INSERT INTO friends VALUES
(1,2,1,3)
, (1,3,0,5)
, (1,4,2,10);
INSERT INTO posts VALUES
(1, 'posts1', 2, now(), 2)
, (2, 'posts2', 1, now(), 2)
, (3, 'posts3', 5, now(), 2)
, (4, 'posts4', 2, now(), 2)
, (5, 'posts5', 11, now(), 2)
, (6, 'posts6', 7, now(), 2)
, (7, 'posts7', 3, now(), 2)
, (8, 'posts8', 4, now(), 3)
, (9, 'posts9', 1, now(), 3)
, (10, 'posts10', 0, now(), 3)
, (11, 'posts11', 7, now(), 3)
, (12, 'posts12', 3, now(), 3)
, (13, 'posts13', 2, now(), 3)
, (14, 'posts14', 4, now(), 4)
, (15, 'posts15', 9, now(), 4)
, (16, 'posts16', 0, now(), 4)
, (17, 'posts17', 3, now(), 4)
, (18, 'posts18', 2, now(), 4)
, (19, 'posts19', 1, now(), 4)
, (20, 'posts20', 2, now(), 4);
Hence I would like to see (post_id, title, rating, date, friend_id)
combinations with the following conditions, if possible:
id
=2id
=3id
=4So basically, if my friend with friend_id=2
posted 1 or more articles, I want at least 2 of them. If he posted more than 3 articles, I want no more than 3.
Let's say I want to have 2-5 posts from you everyday, if you post that much. If you post only one, it is alright, and I will have the only one post.
Your explanation in the comment still does not add up. Your min
number would be noise without effect according to this explanation.
This is not what you wrote, but this would make sense:
Given a maximum of display slots for posts (the outer LIMIT
), I want to get min
posts from each friend first (if available). If there are free slots after that, fill up with up to max
posts per friend.
In the example that would be 1 (min
) post from friend 2 with top priority and another 2 (max - min
) posts if more slots are still available.
It would be arbitrary which posts make the cut if there are not enough slots for each priority. I went ahead and assumed that the first post from each should be selected first, etc.
The rest is still arbitrary, but can be solved easily if you manage to formulate a requirement.
SELECT *
FROM friends f
, LATERAL (
SELECT *
, row_number() OVER (ORDER BY rating DESC NULLS LAST, id DESC) AS rn
FROM posts p
WHERE user_id = f.friend_id -- LATERAL reference
ORDER BY rating DESC NULLS LAST, date DESC NULLS LAST
LIMIT f.max -- LATERAL reference
) p
WHERE f.user_id = 1
ORDER BY (p.rn > f.min) -- minimum posts from each first
, p.rn
LIMIT 10; -- arbitrary total maximum
Assuming friends.user_id
and posts.id
to be primary keys. Your table definition is lacking there.
All other columns should be defined NOT NULL
to make sense.
Use a LATERAL
join to select only max
postings per friend in the subquery:
Use row_number()
, not rank()
in the subquery. It's a common mistake to confuse both.
You mentioned date
but it did not show in your query. Maybe you really want:
, row_number() OVER (ORDER BY rating DESC NULLS LAST
, date DESC NULLS LAST) AS rn
DESC NULLS LAST
only because rating
and date
could be NULL:
In Postgres, you can use a simple boolean expression in ORDER BY
:
ORDER BY (p.rn > f.min), p.rn
That puts min
posts per friend first. The second item (p.rn
) gives each friend an equal chance (first post first etc.).
Don't use as identifier. It's a reserved word in standard SQL and a basic type name in Postgres.date