cassandra-connector-assembly-2.0.0
built from github
project.
with Scala 2.11.8
, cassandra-driver-core-3.1.0
sc.cassandraTable("mykeyspace", "mytable").select("something").where("key=?", key).mapPartitions(par => {
par.map({ row => (row.getString("something"), 1 ) })
})
.reduceByKey(_ + _).collect().foreach(println)
The same job works fine for reading less mass data
java.lang.NoSuchMethodError: com.datastax.driver.core.ResultSet.fetchMoreResults()Lshade/com/datastax/spark/connector/google/common/util/concurrent/ListenableFuture;
at com.datastax.spark.connector.rdd.reader.PrefetchingResultSetIterator.maybePrefetch(PrefetchingResultSetIterator.scala:26)
at com.datastax.spark.connector.rdd.reader.PrefetchingResultSetIterator.next(PrefetchingResultSetIterator.scala:39)
at com.datastax.spark.connector.rdd.reader.PrefetchingResultSetIterator.next(PrefetchingResultSetIterator.scala:17)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$12.next(Iterator.scala:444)
at com.datastax.spark.connector.util.CountingIterator.next(CountingIterator.scala:16)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:194)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
at org.apache.spark.scheduler.Task.run(Task.scala:85)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Can any one suggest or point out to the issue, and a possible solution?
It is a conflict with the Cassandra driver-core that
libraryDependencies += "com.datastax.spark" % "spark-cassandra-connector_2.11" % "2.0.0-M3"
brings in.
If you go into the ~/.ivy2/cache/com.datastax.spark/spark-cassandra-connector_2.11 you will find a file called ivy-2.0.0-M3.xml
In that file the dependency is
com.datastax.cassandra" name="cassandra-driver-core" rev="3.0.2" force="true"
Note that it is the 3.0.2 version of Cassandra driver core which gets overrun by the more recent one.
It just so happens that the latest source on Github does not show a implementation for fetchMoreResults which is inherited from interface PagingIterable
If you roll back the Git version to 3.0.x on Github, you'll find
public ListenableFuture<ResultSet> fetchMoreResults();
So it looks like the newest Cassandra core drivers were rushed out the door incomplete. Or I might be missing something. Hope this helps.
tl;dr; Remove the latest driver and use the one embedded in the spark cassandra connector.