Good morning, it maybe sounds like a stupid question, but I would like to access a temporary table in Spark by RStudio. I don't have any Spark cluster, and I only run everything local on my PC. When I start Spark through IntelliJ, the instance is running fine:
17/11/11 10:11:33 INFO Utils: Successfully started service 'sparkDriver' on port 59505.
17/11/11 10:11:33 INFO SparkEnv: Registering MapOutputTracker
17/11/11 10:11:33 INFO SparkEnv: Registering BlockManagerMaster
17/11/11 10:11:33 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
17/11/11 10:11:33 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
17/11/11 10:11:33 INFO DiskBlockManager: Created local directory at C:\Users\stephan\AppData\Local\Temp\blockmgr-7ca4e8fb-9456-4063-bc6d-39324d7dad4c
17/11/11 10:11:33 INFO MemoryStore: MemoryStore started with capacity 898.5 MB
17/11/11 10:11:33 INFO SparkEnv: Registering OutputCommitCoordinator
17/11/11 10:11:33 INFO Utils: Successfully started service 'SparkUI' on port 4040.
17/11/11 10:11:34 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://172.25.240.1:4040
17/11/11 10:11:34 INFO Executor: Starting executor ID driver on host localhost
17/11/11 10:11:34 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 59516.
17/11/11 10:11:34 INFO NettyBlockTransferService: Server created on 172.25.240.1:59516
But I am not sure about the port, I have to choose in RStudio/sparklyr:
sc <- spark_connect(master = "spark://localhost:7077", spark_home = "C://Users//stephan//Downloads//spark//spark-2.2.0-bin-hadoop2.7", version = "2.2.0")
Error in file(con, "r") : cannot open the connection
In addition: Warning message:
In file(con, "r") :
cannot open file 'C:\Users\stephan\AppData\Local\Temp\Rtmp61Ejow\file2fa024ce51af_spark.log': Permission denied
I tried different ports, like 59516, 4040, ... but all led to the same result. The permission denied message I guess can be ignored due that the file is written fine:
17/11/11 01:07:30 WARN StandaloneAppClient$ClientEndpoint: Failed to connect to master localhost:7077
Can please anyone assist me, how I can establish a connection between a local running Spark and RStudio, but without that RStudio is running another Spark instance?
Thanks Stephan
Running standalone Spark cluster is not the same thing as running Spark in local
mode in your IDE, which is likely the case here. local
mode doesn't create any persistent services.
To run your own "pseudodistributed" cluster:
$SPARK_HOME/sbin/start-master.sh
script.$SPARK_HOME/sbin/start-slave.sh
script and passing master url.To be able to share tables you'll also need a proper metastore (not Derby).