Search code examples
sqlpostgresqlrandomgreatest-n-per-group

PostgreSQL. Select a column that correlates with value in the aggregate function


Here is the 'items' table, containing more than 10 rows:

+-----+-----------+-----------+----------+
| id  | item_name | category  | quantity |
+=====+===========+===========+==========+
| 3   | item33    | category1 | 5        |
+-----+-----------+-----------+----------+
| 2   | item52    | category5 | 1        |
+-----+-----------+-----------+----------+
| 1   | item46    | category1 | 3        |
+-----+-----------+-----------+----------+
| 4   | item11    | category3 | 2        |
+-----+-----------+-----------+----------+
| ... | ...       | ...       | ...      |
+-----+-----------+-----------+----------+

Values in the 'items' column are unique, the ones in the 'category' columnt - aren't unique.

The task is:

  1. Remove duplicates of categories: if a category contains more than 1 item, take the row with minimal 'id'.
  2. Order results by the 'quantity' (ASC).
  3. Take 10 rows: top 5 and random 5 from the rest result data output.

So, the ordering table (after #2 sub-task) should look like that:

+-----+-----------+-----------+----------+
| id  | item_name | category  | quantity |
+=====+===========+===========+==========+
| 2   | item52    | category5 | 1        |
+-----+-----------+-----------+----------+
| 4   | item11    | category3 | 2        |
+-----+-----------+-----------+----------+
| 1   | item46    | category1 | 3        |
+-----+-----------+-----------+----------+
| ... | ...       | ...       | ...      |
+-----+-----------+-----------+----------+

I know how to exclude duplicates for categories:

SELECT min(id) as id, category
FROM items
GROUP BY category

But I don't know how to order it by the quantity. If I try to add 'quantity' to the 'select' line and then make 'ORDER BY quantity', I get the error: "column "quantity" must appear in the GROUP BY clause or be used in an aggregate function".

If there is a way to add this 'quantity' column to the data output (the value in this column should correlate with the resulting 'id' value (i.e. "min(id)"))? And then do ordering and picking rows...


Solution

  • Consider joining back your aggregate query to the unit level data for all columns including quantity:

    SELECT i.id, i.item_name, i.category, i.quantity
    FROM items i
    INNER JOIN 
      (SELECT category, min(id) AS min_id
       FROM items
       GROUP BY category) agg
     ON i.id = agg.min_id
     AND i.category = agg.category
    ORDER BY i.quantity
    

    For top 5 and random 5 split, integrate a union with CTE to hold the resultset:

    WITH sub AS (
      SELECT i.id, i.item_name, i.category, i.quantity
      FROM items i
      INNER JOIN 
        (SELECT category, min(id) AS min_id
         FROM items
         GROUP BY category) agg
       ON i.id = agg.min_id
       AND i.category = agg.category
    )
    
    -- TOP 5 ROWS
    SELECT id, item_name, category, quantity
    FROM sub
    ORDER BY i.quantity
    LIMIT 5
    
    UNION
    
    -- RANDOM ROWS OF NON-TOP 5
    SELECT id, item_name, category, quantity
    FROM 
      (SELECT id, item_name, category, quantity
       FROM sub
       ORDER BY i.quantity
       OFFSET 5) below5
    ORDER BY random()
    LIMIT 5