Search code examples
sqlquery-optimizationsap-asequery-performancesql-execution-plan

SQL union optimized into a left join, faster but Query Plan says cost more I/O


select id, c.name as name
from a join b on a.id=b.id
join c on a.id=c.id
union
select id, d.name as name
from a join b on a.id=b.id
join d on a.id=d.id

optimized to

select id, 
       case when c.name is not null or c.name <> '' then c.name 
       else d.name end as name
from a join b on a.id=b.id
left join c on a.id=c.id
left join d on a.id=d.id
where c.name is not null or d.name is not null

The query response time has improved from 30secs to 13secs.

  • sql union = 30secs
  • sql left join= 13secs

However upon checking the query plan, the sql union has lower I/O cost, see below:

  • sql union = Total estimated I/O cost for statement 1 (at line 1): 6277566.
  • sql left join= Total estimated I/O cost for statement 1 (at line 1): 10481124.

I'm using Sybase 12.5 ASE and the query plan was from DBArtisan 8.5; let me know if I need to upload the whole query plan. I'm not exactly very familiar yet with query plans but I do sql optimizations here and there, normally I just base it on the time improvements. Also I did check that the resultset are the same for both queries (27949 rows). Also I masked and simplified the table names.

My question is, does that mean the sql left join is faster but more resource intensive? And if so, should I still choose the faster alternative?


Solution

  • The database will do some caching internally so execution time is not always the best indicator. If you run the first query and then run the second one right after it, the second is at an unfair advantage because some of the data will likely be cached.

    Like all database tuning questions, nothing is really set in stone. I personally like the union because I think it's a bit more readable but strictly from a performance perspective I would do some extended testing over a longer period of time (to minimize the influence of caching) and see how they perform.

    How much data is there in these tables? Do you have indexes on the id columns in the four tables? If not, that will speed your query up more any change to the sql.