Search code examples
mysqljoinindexingquery-optimization

What index should I use when using JOIN on PRIMARY KEY


I'm trying to optimise the following MySQL query

SELECT Hotel.HotelId, Hotel.Name, Hotel.Enabled, Hotel.IsClosed, 
HotelRoom.HotelId, HotelRoom.RoomId, HotelRoom.Name AS RoomName
FROM Hotel
INNER JOIN 
   HotelRoom ON Hotel.HotelId = HotelRoom.HotelId    
WHERE Hotel.IsClosed = 0
AND Hotel.Enabled = 1
AND HotelRoom.Deleted = 0
AND HotelRoom.Enabled = 1
AND IF(LENGTH(TRIM(sAuxiliaryIds)) > 0 AND sAuxiliaryIds IS NOT NULL, 
FIND_IN_SET(Hotel.AuxiliaryId, sAuxiliaryIds), 1=1) > 0 
ORDER BY Hotel.HotelId ASC, HotelRoom.RoomId ASC

The PRIMARY KEYS are Hotel.Hotel and HotelRoom.RoomId, and I've got a FOREIGN KEY from HotelRoom.HotelId to Hotel.HotelId.

Should I be creating a INDEX for (Hotel.IsClosed, Hotel.Enabled) and (HotelRoom.Deleted, HotelRoom.Enabled) which is used in the WHERE clause, and should this index include the PRIMARY key so for example I should create a INDEX for (Hotel.HotelId, Hotel.IsClosed, Hotel.Enabled)

EDIT 1 I've added the following in the WHERE statement AND IF(LENGTH(TRIM(sAuxiliaryIds)) > 0 AND sAuxiliaryIds IS NOT NULL, FIND_IN_SET(Hotel.AuxiliaryId, sAuxiliaryIds), 1=1) > 0 Should these also be included in INDEX

This is what the EXPLAIN statement is showing for this query

enter image description here

I added both INDEX suggestions but when I ran the EXPLAIN statement they both showed that no key was going to be used

enter image description here


Solution

  • MySQL's Optimizer does not care which table comes first in a JOIN. It will look at statistics (etc) to decide for itself whether to start with Hotel or HotelRoom. So, you should write indexes for both cases, so as not to restrict the Optimizer.

    MySQL almost always performs a JOIN by scanning one table. Then, for each row in that table, look up the necessary row(s) in the other table. See "Nested Loop Join" or "NLJ". This implies that the optimal indexes are (often) thus: For the 'first' table, columns of the WHERE clause involving the first table. For the second table, the columns from both the WHERE and ON clauses involving the second table.

    Assuming that the Optimizer started with Hotel:

    Hotel: INDEX(IsClosed, Enabled)   -- in either order
    HotelRoom:  INDEX(Deleted, Enabled, HotelId)  -- in any order
    

    If it started with HotelRoom:

    HotelRoom:  INDEX(Deleted, Enabled)  -- in either order
    Hotel: PRIMARY KEY(HotelId)  -- which you already have?
    

    If there are a lot of closed/disabled hotels, then this may be beneficial:

    Hotel: INDEX(IsClosed, Enabled, HotelId)
    

    As Tim mentioned, it may be beneficial to augment an index to include the rest of the columns mentioned, thereby making the index "covering". (But don't do this with the PRIMARY KEY or any UNIQUE key.)

    If you provide SHOW CREATE TABLE and the sizes of the tables, we might have further suggestions.