Search code examples
sqlquery-optimization

Speeding up inner joins between a large table and a small table


This may be a silly question, but it may shed some light on how joins work internally.

Let's say I have a large table L and a small table S (100K rows vs. 100 rows).

Would there be any difference in terms of speed between the following two options?:

OPTION 1:                 OPTION 2:
---------                 ---------
SELECT *                  SELECT *
FROM L INNER JOIN S       FROM S INNER JOIN L
ON L.id = S.id;           ON L.id = S.id;

Notice that the only difference is the order in which the tables are joined.

I realize performance may vary between different SQL languages. If so, how would MySQL compare to Access?


Solution

  • No, the order does not matter.

    Almost all RDBMS's (such MS Access, MySQL, SQL Server, ORACLE etc) use a cost based optimiser based upon column statistics. In most situations, the optimiser will choose a correct plan. In the example you gave, the order will not matter (provided statistics are up to date).

    To decide what query strategy to use, the Jet Engine optimizer uses statistics. The following factors are some of the factors that these statistics are based on:

    • The number of records in a table
    • The number of data pages in a table
    • The location of the table
    • Whether indexes are present
    • How unique the indexes are

    Note: You cannot view Jet database engine optimization schemes, and you cannot specify how to optimize a query. However, you can use the Database Documenter to determine whether indexes are present and how unique an index is.

    Based on these statistics, the Optimizer then selects the best internal query strategy for dealing with a particular query.

    The statistics are updated whenever a query is compiled. A query is flagged for compiling when you save any changes to the query (or its underlying tables) and when the database is compacted. If a query is flagged for compiling, the compiling and the updating of statistics occurs the next time that the query is run. Compiling typically takes from one second to four seconds.

    If you add a significant number of records to your database, you must open and then save your queries to recompile the queries. For example, if you design and then test a query by using a small set of sample data, you must re-compile the query after additional records are added to the database. When you do this, you want to make sure that optimal query performance is achieved when your application is in use.

    Ref.

    Might be of interest: ACC: How to Optimize Queries in Microsoft Access 2.0, Microsoft Access 95, and Microsoft Access 97

    Tony Toews's Microsoft Access Performance FAQ is worth reading.

    There's a caveat to "JOIN order does not matter".

    If your RDBMS's cost based query optimiser times out creating the query plan then the join order COULD matter. Cost based optimisers have finite resources (both CPU time and memory) in which to construct a query plan. If they time out during the compilation stage, you will get the best plan found so far.

    TLDR; If you have complex queries that receive a plan compilation timeout (not query execution timeout), then put your most restrictive joins first. That way, at the point the query plan optimiser times out, it will increase the chance that a 'better' plan was found.

    Of course, if you are experiencing query plan compilation timeouts, you should probably simplify your query.