Search code examples
phpmysqlperformanceunionunion-all

Performance of UNION vs UNION ALL


I am selecting a single column of foreign keys from multiple tables through either UNION or UNION ALL.

It is generally recommended to use UNION ALL instead of UNION for performance issues when duplicates do not matter. However, in my calling PHP script it would be more efficient to loop through and manipulate the data without duplicates.

So, I can use either of the following options:

Option 1:

Use UNION in the database to eliminate duplicates

Option 2:

use UNION ALL in the database and eliminate the data in my PHP script using array_unique() or other similar functions.

My assumption is that Option 1 would be the preferred and more efficient method in the majority of cases, however I have nothing to back up that assumption, and not sure the best way to test it especially since it would likely depend a lot on what the data was.

Is my assumption correct in most cases? If so, why? If not, why not?


Solution

  • The mainly aspect is that UNION is shortcut for UNION DISTINCT and so

    the difference in performance between UNION and UNION ALL are related to the
    need to obtain a distinct result and for this the database engine and the query optimizer are surely more effective and most efficient than the filtring alogoritm based on PHP code in application.

    The dictinct Operation can, moreover, benefit from the pre-optimizations for group by functionality

    Not only, the duplicate data filtering is generally based on ordered data while the select sql functions work without explicit ordering, and therefore the need for filtering data with the application can lead to less efficient and more longer queries.

    Generally the db engine is much more efficient that application PHP functions code so the Option 1 is generally the better choise