Search code examples
sqldatabaseperformanceoptimization

Guidance on efficient SQL many-to-many join query with multiple filters


I could use some guidance in building an optimized and efficient SQL query for the following entities:

1. Product:
 - id
 - state (active/archived)

2. Category:
 - id
 - name (e.g. Toiletries)

3. ProductCategory:
 - pid references Product.id
 - cid references Category.id

The target use case is to retrieve a list of products in a list of category ids and with state = 'active'

Now I have 3 options:

  1. Filter product where state = 'active' and ProductCategory where cid in input category IDs (as subqueries?) and join the 2 results

  2. Filter product where state = 'active' first, left join ProductCategory, then filter the results where cid in input category IDs

  3. Filter ProductCategory where cid in input category IDs, left join Products table, filter by state = 'active'

  4. Did I miss any?

My question is what query/filter strategy is recommended here, and why? Assume numProducts >>>> numCategories, numProducts < numProductCategories, since 1 product can have many categories.


Solution

  • As long as your SQL returns the right result then there is no right or wrong way of writing it - and how long it takes to run would be dependent on a lot of information you haven't supplied such as indexes and data volumes.

    If your chosen solution takes longer to run than you expected then it may be possible to tune it but we'd need to see the EXPLAIN PLAN for your query as well as the DDL for your tables.

    In addition, SQL is a declarative language not a procedural one. This means it declares what the result is that you want, not how to achieve that result. The query optimiser decides how to achieve the result so, in theory, all the ways that you've shown for writing the SQL could end up with exactly the same execution plan