sql postgresql postgresql-performance relational-division

Select rows with a specific column greater than zero in some related rows

I have the following tables in a Postgres 9.5 database:

product_days

Column     | Type    | Modifiers
-----------+---------+----------------------------------------------
id         | integer | not null default nextval('product_days_id_seq'::regclass)
days_id    | integer |
product_id | integer |
available  | integer | 
price      | integer |
 Indexes:
"pk_product_days" PRIMARY KEY, btree (id)
"idx_product_days" btree (days_id)
"idx_product_days_0" btree (product_id)
Foreign-key constraints:
   "product_days_days_id_fkey" FOREIGN KEY (days_id) REFERENCES days(id)

product

 Column         |            Type             |                       Modifiers
----------------+-----------------------------+-----------------------------------------------------
id              | integer                     | not null default nextval('product_id_seq'::regclass)
name            | character varying(100)      |
number_of_items | integer                     |
created_at      | timestamp without time zone | default now()
updated_at      | timestamp without time zone | default now()
total_number    | integer                     |
 Indexes:
    "pk_product" PRIMARY KEY, btree (id)

product_days.product_id is a foreign key referring to the product table and available represents the number of products available in each day.

I want to get all products that are available (available > 0) at some specific days (days_id between 5 and 10). They should be available at all of these days.

Currently I am trying to get the result with the following query but I am not sure whether it is correct or this is the most efficient way to do it:

select product.id as p_id, product.name as p_name, product.number_of_items as items
from product_days join product ON product_days.product_id = product.id
WHERE product_days.available > 0
AND  prodcut_days.days_id between 5 and 10
group by product.id
HAVING count(*) > 5;

The output should be like:

 p_id  | p_name    | items
-------+-----------+-------
 1     | product_1 | 4
 2     | product_2 | 13

I need the most efficient way of running this query either in SQL or in plpgsql.

Solution

After clarifying the typos, yes, the query should do what you describe.

This is equivalent and a bit faster:

SELECT id AS p_id, name AS p_name, number_of_items AS items
FROM  (
   SELECT product_id AS id
   FROM   product_days
   WHERE  available > 0
   AND    days_id BETWEEN 5 AND 10
   GROUP  BY 1
   HAVING count(*) > 5
   ) d
JOIN  product p USING (id);

There should be a UNIQUE constraint enforcing max 1 entry per product and day:

ALTER TABLE product_days ADD CONSTRAINT your_name_here UNIQUE (product_id, days_id);

Using a column in sql join without adding it to group by clause

Optimize read performance

If you need to optimize performance for this particular query and there are more than a few non-qualifying rows (not available or day_id not matching), the sharpest weapon would be a partial index:

CREATE INDEX idx_name_here ON product_days (product_id)
WHERE  available > 0
AND    days_id BETWEEN 5 AND 10;

If you have autovacuum running and the write load on the table isn't too big, you should see index-only scans on this index.

And while this index (or a more generic one) is used, this query should be faster, yet:

SELECT id AS p_id, name AS p_name, number_of_items AS items
FROM  (
   SELECT product_id AS id
   FROM   product_days d5
   JOIN   product_days d6  USING (product_id)
   JOIN   product_days d7  USING (product_id)
   JOIN   product_days d8  USING (product_id)
   JOIN   product_days d9  USING (product_id)
   JOIN   product_days d10 USING (product_id)
   WHERE  d5.days_id  = 5  AND d5.available  > 0
   AND    d6.days_id  = 6  AND d6.available  > 0
   AND    d7.days_id  = 7  AND d7.available  > 0
   AND    d8.days_id  = 8  AND d8.available  > 0
   AND    d9.days_id  = 9  AND d9.available  > 0
   AND    d10.days_id = 10 AND d10.available > 0   
   ) d
JOIN  product p USING (id);

Since this is a case of relational-division at its core. See: