Search code examples

SAP Vora 1.2 - Reading Vora tables from HANA

!!! UPDATE !!!

Finally after hours of looking into documentation I found the issue. It turns out that I lacked some parameters in Yarn configuration.

This is what I did:

  1. Open the yarn-site.xml file in an editor or log in to Ambari web UI and select Yarn>Config. Locate the property "yarn.nodemanager.aux-services" and add "spark_shuffle" to its current value. The new property name should be "mapreduce_shuffle,spark_shuffle".
  2. Add or edit the property "yarn.nodemanager.aux-services.spark_shuffle.class", and set it to "".
  3. Copy the spark--yarn-shuffle.jar file (downloaded in the step Install Spark Assembly Files and Dependent Libraries) from Spark to Hadoop-Yarn class path in all the node manager hosts. Typically this folder is located in /usr/hdp//hadoop-yarn/lib.
  4. Restart Yarn and the node manager


I'm using SAP Vora 1.2 Developer Edition with newest Spark Controller (HANASPARKCTRL00P_5-70001262.RPM). I loaded a table into Vora in spark-shell. I can see the table in SAP HANA Studio in "spark_velocity" folder. I can load the table as Virtual Table. The problem is that I cannot select, or preview the data in the table, because of the error:

Error: SAP DBTech JDBC: [403]: internal error: Error opening the cursor for the remote database for query "SELECT "SPARK_testtable"."a1", "SPARK_testtable"."a2", "SPARK_testtable"."a3" FROM "spark_velocity"."testtable" "SPARK_testtable" LIMIT 200 "

Here is my hanaes-site.xml file:

    <!--  You can either copy the assembly jar into HDFS or to lib/external directory.
    Please maintain appropriate value here-->
    <!--  Required if you are copying your files into HDFS-->
    <!--Required property if using controller for DLM scenarios-->
    <!-- Change this value to vora when connecting to Vora store -->

    <!-- // When running against a kerberos protected cluster, please maintain appropriate values
        <value>[email protected]</value>
    <!-- To enable Secure Socket communication, please maintain appropriate values in the follwing section-->

    <!-- Enable the following section if you want to enable dynamic allocation-->


ls /usr/sap/spark/controller/lib/external/


hdfs dfs -ls /sap/hana/spark/libs/thirdparty

Found 4 items
-rwxrwxrwx   3 hdfs hdfs     366565 2016-05-11 13:09 /sap/hana/spark/libs/thirdparty/datanucleus-api-jdo-4.2.1.jar
-rwxrwxrwx   3 hdfs hdfs    2006182 2016-05-11 13:09 /sap/hana/spark/libs/thirdparty/datanucleus-core-4.1.2.jar
-rwxrwxrwx   3 hdfs hdfs    1863315 2016-05-11 13:09 /sap/hana/spark/libs/thirdparty/datanucleus-rdbms-4.1.2.jar
-rwxrwxrwx   3 hdfs hdfs     627814 2016-05-11 13:09 /sap/hana/spark/libs/thirdparty/joda-time-2.9.3.jar

ls /usr/hdp/  current

vi /var/log/hanaes/hana_controller.log

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/sap/spark/controller/lib/spark-sap-datasources-1.2.33-assembly.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/sap/spark/controller/lib/external/spark-assembly-!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/05/12 07:02:38 INFO HanaESConfig: Loaded HANA Extended Store Configuration
Found Spark Libraries. Proceeding with Current Class Path
16/05/12 07:02:39 INFO Server: Starting Spark Controller
16/05/12 07:03:11 INFO CommandRouter: Connecting to Vora Engine
16/05/12 07:03:11 INFO CommandRouter: Initialized Router
16/05/12 07:03:11 INFO CommandRouter: Server started
16/05/12 07:03:43 INFO CommandHandler: Getting BROWSE data/user/17401406272892502037-4985062628452729323_f17e36cf-0003-0015-452e-800c700001ee
16/05/12 07:03:48 INFO CommandHandler: Getting BROWSE data/user/17401406272892502037-4985062628452729329_f17e36cf-0003-0015-452e-800c700001f4
16/05/12 07:03:48 INFO VoraClientFactory: returning a Vora catalog client of this Vora catalog server: master.i-14371789.cluster:2204
16/05/12 07:03:48 INFO CBinder: searching for at /opt/rh/SAP/lib64/
16/05/12 07:03:48 WARN CBinder: could not find
16/05/12 07:03:48 INFO CBinder: searching for at /lib64/
16/05/12 07:03:48 INFO CBinder: loading from /lib64/
16/05/12 07:03:48 INFO CBinder: loading library
16/05/12 07:03:48 INFO CBinder: loading library
16/05/12 07:03:48 INFO CBinder: loading library
16/05/12 07:03:48 INFO CBinder: loading library
16/05/12 07:03:48 INFO CBinder: loading library
16/05/12 07:03:48 INFO CBinder: loading library
16/05/12 07:03:48 INFO CBinder: loading library
16/05/12 07:03:48 INFO CatalogFactory: returning a Vora catalog client of this Vora catalog server: master.i-14371789.cluster:2204
16/05/12 07:11:56 INFO CommandHandler: Getting BROWSE data/user/17401406272892502037-4985062628452729335_f17e36cf-0003-0015-452e-800c700001fa
16/05/12 07:11:56 INFO Utils: freeing the buffer
16/05/12 07:11:56 INFO Utils: freeing the buffer
16/05/12 07:12:02 INFO Utils: freeing the buffer
16/05/12 07:12:02 WARN DefaultSource: Creating a Vora Relation that is actually persistent with a temporary statement!
16/05/12 07:12:02 WARN DefaultSource: Creating a Vora Relation that is actually persistent with a temporary statement!
16/05/12 07:12:02 INFO CatalogFactory: returning a Vora catalog client of this Vora catalog server: master.i-14371789.cluster:2204
16/05/12 07:12:02 INFO Utils: freeing the buffer
16/05/12 07:12:02 INFO DefaultSource: Creating VoraRelation testtable using an existing catalog table
16/05/12 07:12:02 INFO Utils: freeing the buffer
16/05/12 07:12:11 INFO Utils: freeing the buffer
16/05/12 07:14:15 ERROR RequestOrchestrator: Result set was not fetched by connected Client. Hence cancelled the execution
16/05/12 07:14:15 ERROR RequestOrchestrator: org.apache.spark.SparkException: Job 0 cancelled part of cancelled job group f17e36cf-0003-0015-452e-800c70000216
        at org.apache.spark.scheduler.DAGScheduler.handleJobCancellation(DAGScheduler.scala:1229)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleJobGroupCancelled$1.apply$mcVI$sp(DAGScheduler.scala:681)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleJobGroupCancelled$1.apply(DAGScheduler.scala:681)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleJobGroupCancelled$1.apply(DAGScheduler.scala:681)
        at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
        at org.apache.spark.scheduler.DAGScheduler.handleJobGroupCancelled(DAGScheduler.scala:681)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1475)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447)
        at org.apache.spark.util.EventLoop$$anon$
        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1824)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1837)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1850)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1921)
        at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:902)
        at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:900)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:310)
        at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:900)
        at scala.collection.immutable.List.foreach(List.scala:318)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
        at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(

Also strange is this error:

16/05/12 07:03:48 INFO CBinder: searching for at /opt/rh/SAP/lib64/
    16/05/12 07:03:48 WARN CBinder: could not find

Because I have this file in the location:

ls /opt/rh/SAP/lib64/

After changing into now the log file looks different:

    16/05/17 10:04:08 INFO CommandHandler: Getting BROWSE data/user/9110494231822270485-5373255807276155190_7e6efa3c-0003-0015-4a91-a3b020000139
16/05/17 10:04:13 INFO CommandHandler: Getting BROWSE data/user/9110494231822270485-5373255807276155196_7e6efa3c-0003-0015-4a91-a3b02000013f
16/05/17 10:04:13 INFO Utils: freeing the buffer
16/05/17 10:04:13 INFO Utils: freeing the buffer
16/05/17 10:04:13 INFO Utils: freeing the buffer
16/05/17 10:04:13 INFO Utils: freeing the buffer
16/05/17 10:04:29 INFO Utils: freeing the buffer
16/05/17 10:04:29 WARN DefaultSource: Creating a Vora Relation that is actually persistent with a temporary statement!
16/05/17 10:04:29 WARN DefaultSource: Creating a Vora Relation that is actually persistent with a temporary statement!
16/05/17 10:04:29 INFO Utils: freeing the buffer
16/05/17 10:04:29 INFO DefaultSource: Creating VoraRelation testtable using an existing catalog table
16/05/17 10:04:29 INFO Utils: freeing the buffer
16/05/17 10:04:29 INFO Utils: freeing the buffer
16/05/17 10:04:29 INFO Utils: freeing the buffer
16/05/17 10:04:29 INFO ConfigurableHostMapper: Load Strategy: RELAXEDLOCAL (default)
16/05/17 10:04:29 INFO HdfsBlockRetriever: Length of HDFS file (/user/vora/test.csv): 10 bytes.
16/05/17 10:04:29 INFO Utils: freeing the buffer
16/05/17 10:04:29 INFO ConfigurableHostMapper: Load Strategy: RELAXEDLOCAL (default)
16/05/17 10:04:29 INFO TableLoader: Loading table [testtable]
16/05/17 10:04:29 INFO ConfigurableHostMapper: Load Strategy: RELAXEDLOCAL (default)
16/05/17 10:04:29 INFO TableLoader: Initialized 1 loading threads. Waiting until finished... -- 0.00 s
16/05/17 10:04:29 INFO TableLoader: [secondary2.i-a5361638.cluster:2202] Host mapping (Ranges: 1/1 Size: 0.00 MB)
16/05/17 10:04:29 INFO VoraJdbcClient: [secondary2.i-a5361638.cluster:2202] MultiLoad: MULTIFILE
16/05/17 10:04:29 INFO TableLoader: [secondary2.i-a5361638.cluster:2202] Host finished:
    Raw ranges: 1/1
    Size:       0.00 MB
    Time:       0.29 s
    Throughput: 0.00 MB/s
16/05/17 10:04:29 INFO TableLoader: Finished 1 loading threads. -- 0.29 s
16/05/17 10:04:29 INFO TableLoader: Updated catalog -- 0.01 s
16/05/17 10:04:29 INFO TableLoader: Table load statistics:
    Name: testtable
    Size: 0.00 MB
    Hosts: 1
    Time: 0.30 s
    Cluster throughput: 0.00 MB/s
    Avg throughput per host: 0.00 MB/s
16/05/17 10:04:29 INFO Utils: freeing the buffer
16/05/17 10:04:29 INFO TableLoader: Loaded table [testtable] -- 0.37 s
16/05/17 10:04:38 INFO Utils: freeing the buffer
16/05/17 10:06:43 ERROR RequestOrchestrator: Result set was not fetched by connected Client. Hence cancelled the execution
16/05/17 10:06:43 ERROR RequestOrchestrator: org.apache.spark.SparkException: Job 1 cancelled part of cancelled job group 7e6efa3c-0003-0015-4a91-a3b02000015b
        at org.apache.spark.scheduler.DAGScheduler.handleJobCancellation(DAGScheduler.scala:1229)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleJobGroupCancelled$1.apply$mcVI$sp(DAGScheduler.scala:681)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleJobGroupCancelled$1.apply(DAGScheduler.scala:681)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleJobGroupCancelled$1.apply(DAGScheduler.scala:681)
        at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
        at org.apache.spark.scheduler.DAGScheduler.handleJobGroupCancelled(DAGScheduler.scala:681)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1475)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447)
        at org.apache.spark.util.EventLoop$$anon$
        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1824)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1837)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1850)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1921)
        at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:902)
        at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:900)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:310)
        at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:900)
        at scala.collection.immutable.List.foreach(List.scala:318)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
        at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(

I is still "not fetched by the client", but now it looks that vora loaded the table.

Anyone, some ideas how to fix it? The same error appears when I try to read Hive tables insted of Vora.

Error: SAP DBTech JDBC: [403]: internal error: Error opening the cursor for the remote database for query "SELECT "vora_conn_testtable"."a1", "vora_conn_testtable"."a2", "vora_conn_testtable"."a3" FROM "spark_velocity"."testtable" "vora_conn_testtable" LIMIT 200 "


  • Finally after hours of looking into documentation I found the issue. It turns out that I lacked some parameters in Yarn configuration (don't know why this affected HANA-Vora connection).

    This is what I did:

    Open the yarn-site.xml file in an editor or log in to Ambari web UI and select Yarn>Config. Locate the property "yarn.nodemanager.aux-services" and add "spark_shuffle" to its current value. The new property name should be "mapreduce_shuffle,spark_shuffle". Add or edit the property "yarn.nodemanager.aux-services.spark_shuffle.class", and set it to "". Copy the spark--yarn-shuffle.jar file from Spark to Hadoop-Yarn class path in all the node manager hosts. Typically this folder is located in /usr/hdp//hadoop-yarn/lib. Restart Yarn and the node manager