Search code examples
apache-sparkpysparkaws-glue

AWS glue endpoint and port forwarding and spark program throws error


I have been struggling to find the solution for the issue running spark program in zeppelin notebook. Not sure what going wrong

I am using zeppelin zeppelin-0.7.3-bin-all I have created AWS glue endpoint and port forwarding.

Followed these links nothing helped me https://gist.github.com/codspire/7b0955b9e67fe73f6118dad9539cbaa2 https://docs.aws.amazon.com/glue/latest/dg/dev-endpoint-tutorial-local-notebook.html

When i run piece of spark code in http://localhost:8080/

%pyspark
a=5*4
print("value = %i" % (a))
sc.version

Getting the following error

org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:266) at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:250) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:373) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:97) at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:406) at org.apache.zeppelin.scheduler.Job.run(Job.java:175) at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:329) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Unknown Source) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source)

Please help!


Solution

  • When you are creating your endpoint you need to create it with the Glue version compatible to your Zeppelin. In your case (Zeppelin 0.7.3) you need the Glue Version 0.9.