Assume the two tables:
Table A: A1, A2, A_Other
Table B: B1, B2, B_Other
In the following examples, is something
is a condition checked against a fixed value, e.g. = 'ABC'
or < 45
.
I wrote a query like the following (1):
Select * from A
Where A1 IN (
Select Distinct B1 from B
Where B2 is something
And A2 is something
);
What I really meant to write was (2):
Select * from A
Where A1 IN (
Select Distinct B1 from B
Where B2 is something
)
And A2 is something;
Strangely, both queries returned the same result. When looking at the explain plan of query 1, it looked like when the subquery was executed, because the condition A2 is something
was not applicable to the subquery, it was deferred for use as a filter on the main query results.
I would normally expect query 1 to fail because the subquery by itself would fail:
Select Distinct B1 from B
Where B2 is something
And A2 is something; --- ERROR: column "A2" does not exist
But I find this is not the case, and Postgres defers inapplicable subquery conditions to the main query.
Is this standard behaviour or a Postgres anomaly? Where is this documented, and what is this feature called?
Also, I find that if I add a column A2
in table B
, only query 2 works as originally intended. In this case the reference A2
in query 2 would still refer to A.A2
, but the reference in query 1 would refer to the new column B.A2
because it is now applicable directly in the subquery.
Excellent question here, something that a lot of people come across but don't bother to stop and look.
What you are doing is writing a subquery in the WHERE
clause; not an inline view in the FROM
clause. There's the difference.
When you write a subquery in SELECT
or WHERE
clauses, you can access the tables that are in the FROM
clause of the main query. This doesn't happen only in Postgres, but it is a standard behaviour and can be observed in all the leading RDBMSes, including Oracle, SQL Server and MySQL.
When you run the first query, the optimizer takes a look at your entire query and determines when to check for which conditions. It is this behaviour of the optimizer that you see the condition is deferred to the main query because the optimizer figures out that it is faster to evaluate this condition in main query itself without affecting the end result.
If you run just the subquery, commenting out the main query, it is bound to return an error at the position that you have mentioned as the column that is being referred to is not found.
In your last paragraph, you have mentioned that you added a column A2
to table tableB
. What you have observed is right. That happens because of the implicit reference phenomenon. If you don't mention the table alias for a column, the database engine looks for the column first in the tables in FROM
clause of the subquery. Only if the column is not found there, a reference is made to the tables in main query. If you use the following query, it would still return the same result:
Select * from A aa -- Check the alias
Where A1 IN (
Select Distinct B1 from B bb
Where B2 is something
And aa.A2 is something -- Check the reference
);
Perhaps you can find more information in Korth's book on relational database, but I'm not sure. I have just answered your question based on my observations. I know this happens and why. I just don't know how I can provide you with further references.