I submit a spark job first like this in a pyspark file
os.system(f'spark-submit --master local --jars ./examples/lib/app.jar app.py')
Then in the submitted app.py file, I create a new SparkSession like this:
spark = SparkSession.builder.appName(appName) \
.config('spark.jars') \
.getOrCreate()
Error message:
23/01/17 11:02:52 INFO SparkContext: Running Spark version 3.3.0
23/01/17 11:02:52 INFO ResourceUtils: ==============================================================
23/01/17 11:02:52 INFO ResourceUtils: No custom resources configured for spark.driver.
23/01/17 11:02:52 INFO ResourceUtils: ==============================================================
23/01/17 11:02:52 INFO SparkContext: Submitted application: symbolic_test
23/01/17 11:02:52 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
23/01/17 11:02:52 INFO ResourceProfile: Limiting resource is cpu
23/01/17 11:02:53 INFO ResourceProfileManager: Added ResourceProfile id: 0
23/01/17 11:02:53 INFO SecurityManager: Changing view acls to: annie
23/01/17 11:02:53 INFO SecurityManager: Changing modify acls to: annie
23/01/17 11:02:53 INFO SecurityManager: Changing view acls groups to:
23/01/17 11:02:53 INFO SecurityManager: Changing modify acls groups to:
23/01/17 11:02:53 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(annie); groups with view permissions: Set(); users with modify permissions: Set(annie); groups with modify permissions: Set()
23/01/17 11:02:53 INFO Utils: Successfully started service 'sparkDriver' on port 42141.
23/01/17 11:02:53 INFO SparkEnv: Registering MapOutputTracker
23/01/17 11:02:53 INFO SparkEnv: Registering BlockManagerMaster
23/01/17 11:02:53 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
23/01/17 11:02:53 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
23/01/17 11:02:53 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
23/01/17 11:02:53 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-e4cc3b01-a6d5-4454-ad2d-4d0f42066479
23/01/17 11:02:53 INFO MemoryStore: MemoryStore started with capacity 434.4 MiB
23/01/17 11:02:53 INFO SparkEnv: Registering OutputCommitCoordinator
23/01/17 11:02:53 INFO Utils: Successfully started service 'SparkUI' on port 4040.
23/01/17 11:02:53 ERROR SparkContext: Failed to add None to Spark environment
java.io.FileNotFoundException: Jar /home/annie/exampleApp/example/None not found
at org.apache.spark.SparkContext.addLocalJarFile$1(SparkContext.scala:1949)
at org.apache.spark.SparkContext.addJar(SparkContext.scala:2004)
at org.apache.spark.SparkContext.$anonfun$new$12(SparkContext.scala:507)
at org.apache.spark.SparkContext.$anonfun$new$12$adapted(SparkContext.scala:507)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:507)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:829)
when creating spark session through pyspark, I get the following error messages, which only arise when I add .config('spark.jars').
I've set my $SPARK_HOME variable correctly... Any help will be appreciated!
If your code sample is true you do not assign any value to spark.jars key while creating spark session. Assigning jar path as value may solve the error.
SparkSession.builder.appName(appName) \
.config('config_key', config_value) \