Search code examples
sqlquery-optimizationanti-join

sql optimization with anti join


i have a recursive table of category and a company table with fields like:

category(id, name, parent) // parent is foreign key to category id :)
company(id, category_1, category_2, category_3) // category_* is foreign key to category id

category tree is at max depth=3 ;

category cx -> category cy -> category cz

with knowledge of company categories are always linked to last category (c3), i want all categories that a company is linked to (c1z, c2z, c3z, c1y, c2y, c3y, c1x, c2x, c3x) for my search engine. //c1y is the parent of category_1 and c1x is parent of parent of category 1...

the best query i came up with is :

SELECT
  ID,
  NAME
FROM category c3
WHERE ID IN (
    select category_1 from company where id=:companyId
  union
    select category_2 from company where id=:companyId
  union
    select category_3 from company where id=:companyId
  union
        select parent from category where id in (
          select category_1 from company where id=:companyId
          union
          select category_2 from company where id=:companyId
          union
          select category_3 from company where id=:companyId
        )
  union 
        select parent from category where id in (
          select parent from category where id in (
          select category_1 from company where id=:companyId
          union
          select category_2 from company where id=:companyId
          union
          select category_3 from company where id=:companyId
          )
        )
  )

it has so much duplicate in it. one for category_* in company. and one for repeating it multiple times.

any way to remove all this duplicates ?

--update--

suppose we solve the category-* field with using two tables what about recursive problem with 3 levels of category?

for example if there is only one category it would look like

SELECT
  ID,
  NAME
FROM category
WHERE ID IN (
  select category_1 from company where id=:companyId
  union
  select parent from category where id in (
    select category_1 from company where id=:companyId
  )
  union
  select parent from category where id in (
    select parent from category where id in (
      select category_1 from company where id=:companyId
    )
  )
);

Solution

  • If you want to join data, use something like this (SQL server example):

    DECLARE @category TABLE (id INT IDENTITY(1,1), name VARCHAR(30), parent INT) -- parent is foreign key to category id :)
    DECLARE @company TABLE (id INT IDENTITY(1,1), category_1 INT, category_2 INT, category_3 INT) --category_* is foreign key to category->id
    
    
    INSERT INTO @category (name, parent )
    VALUES('Top category', null), ('Cars', 1)
    
    INSERT INTO @company (category_1, category_2 , category_3 )
    VALUES(2, null, null), (2, 2, null), (2, 2, 2)
    
    
    SELECT t1.*, t2.*
    FROM @category AS t1 INNER JOIN @company AS t2 ON t1.id = t2.category_1 or t1.id = t2.category_2  or t1.id = t2.category_3 
    

    Above code produces:

    id  name    parent  id  category_1  category_2  category_3
    2   Cars    1   1   2   NULL    NULL
    2   Cars    1   2   2   2   NULL
    2   Cars    1   3   2   2   2
    

    But, such kind of database structure is wrong!

    Instead of one table

    company(id, category_1, category_2, category_3)
    

    create two tables

    company(id, name)
    comp_cat(id, comp_id, cat_id)
    

    Why? I don't want to answer directly, so i ask you: 1) what happens when company is related to more than 3 categories? 2) why to save nulls in case when second and third category is not set?

    In case of SQL Server, you can use Common Table Expressions:

    ;WITH CTE AS
    (
        SELECT id, category_1 AS cat_id
        FROM @company 
        WHERE NOT category_1 IS NULL
        UNION ALL
        SELECT id, category_2 AS cat_id
        FROM @company 
        WHERE NOT category_2 IS NULL
        UNION ALL
        SELECT id, category_3 AS cat_id
        FROM @company 
        WHERE NOT category_3 IS NULL
    )
    SELECT DISTINCT t1.*, t2.*
    FROM CTE AS t1 INNER JOIN @category AS t2 ON t1.cat_id = t2.id 
    

    Cheers, Maciej