Search code examples
sqlsql-execution-plan

SQL query plan sequence


Context (not really necessary to answer the question)
We are trying to optimize the data structure for SPARQL on SQL queries and want to avoid unneccesary comparisons for join operations using a data format other than the N-Triples. We want to filter out data that is not relevant for the result/joins. Crucial for the runtime is the size of the join, so we have to know if the expression in the WHERE clause gets evaluated before the actual join. Assumably it is true, but we need to know for sure.

tl;dr
When is the WHERE clause evaluated? Does the standard specify this at all, or is it left to the implementation? I read about using the ON clause to force it. I'd apreciate if someone could reference this with proper sources.


Solution

  • SQL is a language that describes the results being produced. It does not describe the particular steps for producing them. That is, SQL is not a procedural language.

    A very important part of SQL engines is the optimizer. This turns the SQL statement into a series of steps, typically best understood as a dataflow. These steps might involve complex algorithms (such as parallel out-of-memory hash tables) or supporting structures such as indexes. Even simple SQL statements can have many different alternative execution plans considered by the optimizer.

    Hence, there is no answer to your question, because the execution of a SQL statement is not based specifically on the clauses. Of course, this depends on the underlying database. Some databases, such as MySQL and MS Access, have rather naive optimizers which are often easy to predict. Others such as Postgres, Oracle, SQL Server, Teradata, and DB2 have more complex optimizers.

    I would advise, moreover, that if you are using joins, then always use the explicit JOIN operator along with an ON clause.