Search code examples
sqlpostgresqlneo4jgreatest-n-per-groupsql-limit

Minimum number of rows per group


The following is my query:

SELECT *
FROM (
    SELECT f.max, f.min, p.user_id, p.id, p.title, p.rating,
    RANK() OVER (
        PARTITION BY p.user_id
        ORDER BY p.rating DESC, p.id DESC
    ) AS rnk
    FROM posts AS p
    INNER JOIN friends AS f ON (p.user_id = f.friend_id)
    WHERE f.user_id=1
) AS subq
WHERE (subq.rnk <= subq.max)
LIMIT 10

It searches for posts of my friends, sorted by their rating and date. The window function implemented in this query lets me limit the number of rows returned for each friend according to the MAX field on Friends table.

However, I also have a field MIN, which is used to specify the minimum number of posts I want from the query for a given friend. How is that possible?

I also wonder if SQL is the best option for those types of queries? I already tried Neo4j Graph database, and while it seemed as a good solution, I would rather avoid using 2 separate databases.

SQLFiddle

Schema:

CREATE TABLE friends(
    user_id int,
    friend_id int,
    min int,
    max int
);

CREATE TABLE posts(
   id int,
   title varchar(255),
   rating int,
   date date,
   user_id int
);

Suppose we have the following data:

INSERT INTO friends VALUES
  (1,2,1,3)
, (1,3,0,5)
, (1,4,2,10);

INSERT INTO posts VALUES
  (1,  'posts1',  2,  now(), 2)
, (2,  'posts2',  1,  now(), 2)
, (3,  'posts3',  5,  now(), 2)
, (4,  'posts4',  2,  now(), 2)
, (5,  'posts5',  11, now(), 2)
, (6,  'posts6',  7,  now(), 2)
, (7,  'posts7',  3,  now(), 2)
, (8,  'posts8',  4,  now(), 3)
, (9,  'posts9',  1,  now(), 3)
, (10, 'posts10', 0,  now(), 3)
, (11, 'posts11', 7,  now(), 3)
, (12, 'posts12', 3,  now(), 3)
, (13, 'posts13', 2,  now(), 3)
, (14, 'posts14', 4,  now(), 4)
, (15, 'posts15', 9,  now(), 4)
, (16, 'posts16', 0,  now(), 4)
, (17, 'posts17', 3,  now(), 4)
, (18, 'posts18', 2,  now(), 4)
, (19, 'posts19', 1,  now(), 4)
, (20, 'posts20', 2,  now(), 4);

Hence I would like to see (post_id, title, rating, date, friend_id) combinations with the following conditions, if possible:

  1. between 1 and 3 posts from the friend with id=2
  2. between 0 and 5 posts from the friend with id=3
  3. between 2 and 10 posts from the friend with id=4

So basically, if my friend with friend_id=2 posted 1 or more articles, I want at least 2 of them. If he posted more than 3 articles, I want no more than 3.


Solution

  • Let's say I want to have 2-5 posts from you everyday, if you post that much. If you post only one, it is alright, and I will have the only one post.

    Your explanation in the comment still does not add up. Your min number would be noise without effect according to this explanation.

    This is not what you wrote, but this would make sense:

    Given a maximum of display slots for posts (the outer LIMIT), I want to get min posts from each friend first (if available). If there are free slots after that, fill up with up to max posts per friend.

    In the example that would be 1 (min) post from friend 2 with top priority and another 2 (max - min) posts if more slots are still available.

    It would be arbitrary which posts make the cut if there are not enough slots for each priority. I went ahead and assumed that the first post from each should be selected first, etc.

    The rest is still arbitrary, but can be solved easily if you manage to formulate a requirement.

    SELECT *
    FROM   friends f
    ,  LATERAL (
       SELECT *
            , row_number() OVER (ORDER  BY rating DESC NULLS LAST, id DESC) AS rn
       FROM   posts p
       WHERE  user_id = f.friend_id  -- LATERAL reference
       ORDER  BY rating DESC NULLS LAST, date DESC NULLS LAST
       LIMIT  f.max  -- LATERAL reference
       ) p
    WHERE  f.user_id = 1
    ORDER  BY (p.rn > f.min)  -- minimum posts from each first
            , p.rn
    LIMIT  10;  -- arbitrary total maximum
    

    SQL Fiddle.

    Notes