Search code examples
sqlpostgresqlpostgresql-performancerelational-division

Select rows with a specific column greater than zero in some related rows


I have the following tables in a Postgres 9.5 database:

product_days

Column     | Type    | Modifiers
-----------+---------+----------------------------------------------
id         | integer | not null default nextval('product_days_id_seq'::regclass)
days_id    | integer |
product_id | integer |
available  | integer | 
price      | integer |
 Indexes:
"pk_product_days" PRIMARY KEY, btree (id)
"idx_product_days" btree (days_id)
"idx_product_days_0" btree (product_id)
Foreign-key constraints:
   "product_days_days_id_fkey" FOREIGN KEY (days_id) REFERENCES days(id)

product

 Column         |            Type             |                       Modifiers
----------------+-----------------------------+-----------------------------------------------------
id              | integer                     | not null default nextval('product_id_seq'::regclass)
name            | character varying(100)      |
number_of_items | integer                     |
created_at      | timestamp without time zone | default now()
updated_at      | timestamp without time zone | default now()
total_number    | integer                     |
 Indexes:
    "pk_product" PRIMARY KEY, btree (id)

product_days.product_id is a foreign key referring to the product table and available represents the number of products available in each day.

I want to get all products that are available (available > 0) at some specific days (days_id between 5 and 10). They should be available at all of these days.

Currently I am trying to get the result with the following query but I am not sure whether it is correct or this is the most efficient way to do it:

select product.id as p_id, product.name as p_name, product.number_of_items as items
from product_days join product ON product_days.product_id = product.id
WHERE product_days.available > 0
AND  prodcut_days.days_id between 5 and 10
group by product.id
HAVING count(*) > 5;

The output should be like:

 p_id  | p_name    | items
-------+-----------+-------
 1     | product_1 | 4
 2     | product_2 | 13

I need the most efficient way of running this query either in SQL or in plpgsql.


Solution

  • After clarifying the typos, yes, the query should do what you describe.

    This is equivalent and a bit faster:

    SELECT id AS p_id, name AS p_name, number_of_items AS items
    FROM  (
       SELECT product_id AS id
       FROM   product_days
       WHERE  available > 0
       AND    days_id BETWEEN 5 AND 10
       GROUP  BY 1
       HAVING count(*) > 5
       ) d
    JOIN  product p USING (id);
    

    There should be a UNIQUE constraint enforcing max 1 entry per product and day:

    ALTER TABLE product_days ADD CONSTRAINT your_name_here UNIQUE (product_id, days_id);
    

    Related:

    Optimize read performance

    If you need to optimize performance for this particular query and there are more than a few non-qualifying rows (not available or day_id not matching), the sharpest weapon would be a partial index:

    CREATE INDEX idx_name_here ON product_days (product_id)
    WHERE  available > 0
    AND    days_id BETWEEN 5 AND 10;
    

    If you have autovacuum running and the write load on the table isn't too big, you should see index-only scans on this index.

    And while this index (or a more generic one) is used, this query should be faster, yet:

    SELECT id AS p_id, name AS p_name, number_of_items AS items
    FROM  (
       SELECT product_id AS id
       FROM   product_days d5
       JOIN   product_days d6  USING (product_id)
       JOIN   product_days d7  USING (product_id)
       JOIN   product_days d8  USING (product_id)
       JOIN   product_days d9  USING (product_id)
       JOIN   product_days d10 USING (product_id)
       WHERE  d5.days_id  = 5  AND d5.available  > 0
       AND    d6.days_id  = 6  AND d6.available  > 0
       AND    d7.days_id  = 7  AND d7.available  > 0
       AND    d8.days_id  = 8  AND d8.available  > 0
       AND    d9.days_id  = 9  AND d9.available  > 0
       AND    d10.days_id = 10 AND d10.available > 0   
       ) d
    JOIN  product p USING (id);
    

    Since this is a case of at its core. See: