Search code examples
mysqlsqlplsqlsql-tuning

Special case of Equi Join


I came across this particular script which uses a special form of equi join.

SELECT * 
FROM 
per_assignments a, per_assigment_types b
WHERE
a.assignment_status_type_id + 0  = b.assignment_status_type_id

Why is the zero added in the equi join? I came to know that it has something to do with avoiding index search, but still can some one explain the complete picture of the same. Thanks in advance

Edit :

It's not something which is related to the Table / Column declarations. As far as I know it's something to do with SQL tuning.

This is what I found :-

  1. This is used in smaller tables.
  2. Instead of doing an index search as done normally, this would search the complete table in one go.

But I really don't know exactly what's the difference with a normal equi-join, moreover how indexing affects performance.

It would be really helpful if some one could describe within the particular context and also let me know if my findings are wrong. Appreciate your time and effort for the same :-)

Column Description:

The assignment status type Id's in both tables are declared as NUMBER(9)


Solution

  • The reason for killing the index use for small tables is performance. When you use an index to perform a join it takes two disk I/Os to read data. One to read the index, and a second to read the data from the full table. With smaller tables it is can be faster to read the whole table and perform a full table scan than to perform the second disk I/O.

    This is a broad generalization and may vary from time to time even in your database. In theory the SQL optimizer should be smart enough to recognize this condition and use the full table scan over an index lookup even without the hint. It also possible that if you add data to one or both tables, it moves the faster performance from full table scan to index lookup.

    The questions I have about tuning these queries would be:

    1. What are the precise definitions of the tables, including how full are VARCHAR columns (if any) on average?
    2. How many rows are in each table?
    3. How many rows are added to each table per day?
    4. How often is this query executed?
    5. Has anyone timed they query execution with both options to see which is faster?

    My concern would be this query was written as a clever performance enhancement, either for an earlier version of the database or simply as a clever hack without realizing the query optimizer may do as good or better job.