Search code examples
mysqlperformanceleft-joinexecution

Too long time execution for LEFT JOIN query


I have 2 table to join: "product" table, this table contains 257613 rows and this is the structure:

id int(11) primary key autoincrement
id_category varchar(100)
name_category varchar(500)
name varchar(1000)
name_translated varchar(1000)
reference varchar(100)
link varchar(1000)
original_price varchar(45)
resell_price varchar(45)
active int(11)
ean varchar(16)
json_detail text
date_add date
date_update date

"stores_product" table, this table contains 181142 rows and this is the structure:

id int(11) primary key autoincrement
reference varchar(128)
id_product int(11)
id_image_product int(11)
id_stock_product
id_store int(11)

and this is the incriminated slow query:

SELECT * FROM product AS p 
LEFT JOIN stores_product AS sp ON p.reference = sp.reference 
WHERE sp.id_store = 3

This query not get me a reply, I have blocked the execution after 35 minutes without a result. Too many rows to process? Or I make an error in the query?


Solution

  • There are a few things to note about this:

    1. There is no benefit in doing an outer join, when you have a non-null condition on the joined field(s) (in your case sp.id_store = 3). Since outer joins are more costly than inner joins, use the latter in this case: inner join. The result is the same, but probably faster.

    2. If on the other hand you had hoped to list all products by doing the outer join, then your query is incorrect. You then must move the condition out of the where clause into the on clause, like this:

      LEFT JOIN stores_product AS sp
             ON p.reference = sp.reference 
            AND sp.id_store = 3
      
    3. The join condition does not look as expected. Normally, you would expect sp.id_product = p.id. But in comments you explain these two fields are unrelated. That is a very confusing way of naming things. You should consider storing a foreign key which references the primary key in the product table.

    4. Depending on how your data is distributed, you will get a benefit from one of the following two indexes -- which you need to create: stores_product(id_store, reference) or stores_product(reference, id_store).

    5. Obviously product(id) should be a primary key.

    Create the missing indexes, and look at the execution plan with explain select ... and see which are actually used.