Bugs and sharp corners are part of life. I'm not overly concerned that it's happening. More concerned with how I can figure out why because this means there's something how spark parses queries that I fundamentally don't understand. Moreover, I can't figure any reason why you'd deliberately want to do the behavior that forces this error. But it's easy to accidentally do (define an unused CTE) and it leaves you with an error that is a nightmare to figure out.
Is there anything beyond "explain yada_yada" that I can do to further look under the hood of failed sql commands?
Screenshots are provided just for proof that I'm not losing my mind, pertinent code included as plain text as well.
Runtime: DBR 13.3 LTS
Spark Version: 3.4.1
Set Up and Summary: N temporary views which run successfully in isolation. Final dependent temporary view. Same behavior is observed using global or local temp views.
When the primary temporary views are referenced in CTE's in the final dependent view, all CTE's must be in the final select's namespace (i.e. joined to), or the primary view falls out of scope of the namespace completely.
create or replace global temporary view gtv_1 as
select 1 as dummy_1;
create or replace global temporary view gtv_2 as
select 1 as dummy_2;
This works (note the superfluous join to cte2 despite not being in the select):
create or replace global temporary view gtv_3 as
with cte1 as (
select dummy_1
from global_temp.gtv_1
),
cte2 as (
select dummy_2
from global_temp.gtv_2
)
select cte1.*
from cte1
inner
join cte2
on cte1.dummy_1 = cte2.dummy_2;
select *
from global_temp.gtv_3;
dummy_1
1
This does not work: Note the absence of the superfluous join
create or replace global temporary view gtv_3 as
with cte1 as (
select dummy_1
from global_temp.gtv_1
),
cte2 as (
select dummy_2
from global_temp.gtv_2
)
select cte1.*
from cte1;
select *
from global_temp.gtv_3;
Fails with error:
[TABLE_OR_VIEW_NOT_FOUND] The table or view global_temp
.gtv_2
cannot be found. Verify the spelling and correctness of the schema and catalog.
If you did not qualify the name with a schema, verify the current_schema() output, or qualify the name with the correct schema and catalog.
To tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF EXISTS.; line 7 pos 13
explain select * from global_temp.gtv_3;
Issue | Description |
---|---|
Error Type | Query Planning Error |
Error Message | [TABLE_OR_VIEW_NOT_FOUND] The table or view global_temp .gtv_2 cannot be found. Verify the spelling and correctness of the schema and catalog. If not qualified, check the current_schema() output or qualify the name with the correct schema and catalog. To tolerate the error on drop, use DROP VIEW IF EXISTS or DROP TABLE IF EXISTS . |
Error Location | Line 7, Position 13 |
EDIT:
Traced it to this configuration option
Won't pretend that I'm qualified to actually explain why this matters. Looks like behavior that was introduced solving something else with nested CTE naming. But just looking at the code I'm satisfied that CTE references are sufficiently complex that oddities are understandable.
So this works, with LEGACY and only LEGACY.
%sql
set spark.sql.legacy.ctePrecedencePolicy = LEGACY; -- LEGACY works. Switching to CORRECTED or EXCEPTION does not.
create or replace temporary view gtv_1 as
select 1 as dummy_1;
create or replace temporary view gtv_2 as
select 1 as dummy_2;
create or replace temporary view gtv_3 as
with cte1 as (
select dummy_1
from gtv_1
),
cte2 as (
select dummy_2
from gtv_2
)
select cte1.*
from cte1;
select *
from gtv_3
Moving edit to an answer
This configuration option influences the name scope of cte's.
Can't explain how exactly that code works, but keeping track of cte references looks complex enough that they can be forgiven for edge cases.
This works, with LEGACY and only LEGACY.
%sql
set spark.sql.legacy.ctePrecedencePolicy = LEGACY; -- LEGACY works. Switching to CORRECTED or EXCEPTION does not.
create or replace temporary view gtv_1 as
select 1 as dummy_1;
create or replace temporary view gtv_2 as
select 1 as dummy_2;
create or replace temporary view gtv_3 as
with cte1 as (
select dummy_1
from gtv_1
),
cte2 as (
select dummy_2
from gtv_2
)
select cte1.*
from cte1;
select *
from gtv_3