Search code examples
cassandraapache-sparkdatastaxdatastax-enterprise

Getting started with Spark (Datastax Enterprise)


I'm trying to setup and run my first Spark query following the official example. On our local machines we have already setup last version of Datastax Enterprise packet (for now it is 4.7).

I do everything exactly according documentation, I appended latest version of dse.jar to my project but errors comes right from the beginning:

Here is the snippet from their example

SparkConf conf = DseSparkConfHelper.enrichSparkConf(new SparkConf())
            .setAppName( "My application");
DseSparkContext sc = new DseSparkContext(conf);

Now it appears that DseSparkContext class has only default empty constructor.

Right after these lines comes the following

JavaRDD<String> cassandraRdd = CassandraJavaUtil.javaFunctions(sc)
    .cassandraTable("my_keyspace", "my_table", .mapColumnTo(String.class))
    .select("my_column");

And here comes the main problem, CassandraJavaUtil.javaFunctions(sc)method accepts only SparkContext on input and not DseSparkContext (SparkContext and DseSparkContext are completely different classes and one is not inherited from another).

I assume that documentation is not up to date with the realese version and if anyone met this problem before, please share with me your experience,

Thank you!


Solution

  • There looks like a bug in the docs. That should be

    DseSparkContext.apply(conf)
    

    Since DseSparkContext is a Scala object which uses the Apply function to create new SparkContexts. In Scala you can just write DseSparkContext(conf) but in Java you must actually call the method. I know you don't have access to this code so I'll make sure that this gets fixed in the documentation and see if we can get better API docs up.