Search code examples
apache-sparkpyspark

How to solve the maximum view depth error in Spark?


I have a very long task that creates a bunch of views using Spark SQL and I get the following error at some step: pyspark.sql.utils.AnalysisException: The depth of view 'foobar' exceeds the maximum view resolution depth (100).

I have been searching in Google and SO and couldn't find anyone with a similar error.

I have tried caching the view foobar, but that doesn't help. I'm thinking of creating temporary tables as a workaround, as I would like not to change the current Spark Configuration if possible, but I'm not sure if I'm missing something.

UPDATE: I tried creating tables in parquet format to reference tables and not views, but I still get the same error. I applied that to all the input tables to the SQL query that causes the error.

If it makes a difference, I'm using ANSI SQL, not the python API.


Solution

  • Solution

    Using parque tables worked for me after all. I spotted that I was still missing one table to persist so that's why it wouldn't work.

    So I changed my SQL statements from this:

    CREATE OR REPLACE TEMPORARY VIEW `VIEW_NAME` AS
    SELECT ...
    

    To:

    CREATE TABLE `TABLE_NAME` USING PARQUET AS
    SELECT ...
    

    To move all the critical views to parquet tables under spark_warehouse/ - or whatever you have configured.

    Note:

    This will write the table on the master node's disk. Make sure you have enough disk or consider dumping in an external data store like s3 or what have you. Read this as an alternative - and now preferred - solution using checkpoints.