mysql sql join indexing query-optimization

Should I create 2 indexes for the same column to speed up a join?

I am new to database index and I've just read about what an index is, differences between clustered and non clustered and what composite index is.

So for a inner join query like this:

SELECT columnA
FROM table1
INNER JOIN table2
ON table1.columnA= table2.columnA;

In order to speed up the join, should I create 2 indexes, one for table1.columnA and the other for table2.columnA , or just creating 1 index for table1 or table2?

One is good enough? I don't get it, for example, if I select some data from table2 first and based on the result to join on columnA, then I am looping through results one by one from table2, then an index from table2.columnA is totally useless here, because I don't need to find anything in table2 now. So I am needing a index for table1.columnA.

And vice versa, I need a table2.columnA if I select some results from table1 first and want to join on columnA.

Well, I don't know how in reality "select xxxx first then join based on ..." looks like, but that scenario just came into my mind. It would be much appreciated if someone could also give a simple example.

Solution

One index is sufficient, but the question is which one?

It depends on how the MySQL optimizer decides to order the tables in the join.

For an inner join, the results are the same for table1 INNER JOIN table2 versus table2 INNER JOIN table1, so the optimizer may choose to change the order. It is not constrained to join the tables in the order you specified them in your query.

The difference from an indexing perspective is whether it will first loop over rows of table1, and do lookups to find matching rows in table2, or vice-versa: loop over rows of table2 and do lookups to find rows in table1.

MySQL does joins as "nested loops". It's as if you had written code in your favorite language like this:

foreach row in table1 {
  look up rows in table2 matching table1.column_name
}

This lookup will make use of the index in table2. An index in table1 is not relevant to this example, since your query is scanning every row of table1 anyway.

How can you tell which table order is used? You can use EXPLAIN. It will show you a row for each table reference in the query, and it will present them in the join order.

Keep in mind the presence of an index in either table may influence the optimizer's choice of how to order the tables. It will try to pick the table order that results in the least expensive query.

So maybe it doesn't matter which table you add the index to, because whichever one you put the index on will become the second table in the join order, because it makes it more efficient to do the lookup that way. Use EXPLAIN to find out.