Search code examples
apache-sparkapache-spark-sql

Spark Connect SQL Parsing Exception - Spark 3.5.3 and 3.5.4


I'm running spark connect locally and I keep getting parsing exceptions any time I try to run a pure sql query, even on extremely basic queries.

Im running this on a fresh spark 3.5.4 install:

$SPARK_HOME/sbin/start-connect-server.sh --packages org.apache.spark:spark-connect_2.12:3.5.4

Here's the sql query I'm running after creating the spark connect session:

spark = SparkSession.builder.remote("sc://localhost:15002").getOrCreate()
spark.sql("select 1 as test").show()

Here's the stack trace:

at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(parsers.scala:257)
    at org.apache.spark.sql.catalyst.parser.AbstractParser.parse(parsers.scala:98)
    at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:54)
    at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(AbstractSqlParser.scala:68)
    at org.apache.spark.sql.SparkSession.$anonfun$sql$5(SparkSession.scala:684)
    at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:138)
    at org.apache.spark.sql.SparkSession.$anonfun$sql$4(SparkSession.scala:683)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
    at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:682)
    at org.apache.spark.sql.connect.planner.SparkConnectPlanner.handleSqlCommand(SparkConnectPlanner.scala:2469)
    at org.apache.spark.sql.connect.planner.SparkConnectPlanner.process(SparkConnectPlanner.scala:2434)
    at org.apache.spark.sql.connect.execution.ExecuteThreadRunner.handleCommand(ExecuteThreadRunner.scala:208)
    at org.apache.spark.sql.connect.execution.ExecuteThreadRunner.$anonfun$executeInternal$1(ExecuteThreadRunner.scala:164)
    at org.apache.spark.sql.connect.execution.ExecuteThreadRunner.$anonfun$executeInternal$1$adapted(ExecuteThreadRunner.scala:138)
    at org.apache.spark.sql.connect.service.SessionHolder.$anonfun$withSession$2(SessionHolder.scala:189)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
    at org.apache.spark.sql.connect.service.SessionHolder.$anonfun$withSession$1(SessionHolder.scala:189)
    at org.apache.spark.JobArtifactSet$.withActiveJobArtifactState(JobArtifactSet.scala:94)
    at org.apache.spark.sql.connect.service.SessionHolder.$anonfun$withContextClassLoader$1(SessionHolder.scala:176)
    at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:178)
    at org.apache.spark.sql.connect.service.SessionHolder.withContextClassLoader(SessionHolder.scala:175)
    at org.apache.spark.sql.connect.service.SessionHolder.withSession(SessionHolder.scala:188)
    at org.apache.spark.sql.connect.execution.ExecuteThreadRunner.executeInternal(ExecuteThreadRunner.scala:138)
    at org.apache.spark.sql.connect.execution.ExecuteThreadRunner.org$apache$spark$sql$connect$execution$ExecuteThreadRunner$$execute(ExecuteThreadRunner.scala:90)
    at org.apache.spark.sql.connect.execution.ExecuteThreadRunner$ExecutionThread.run(ExecuteThreadRunner.scala:235)

But it looks like the exception is really coming from here and my debugger won't let me step into that method any further https://github.com/apache/spark/blob/v3.5.4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AbstractSqlParser.scala#L69

Happy to try any config options


Solution

  • Ended up figuring it out, I was using the system pyspark which was 4.0.0-preview and the proto was different so it wasn't parsing right.