Search code examples
javaapache-sparkapache-spark-sqlcassandraspark-cassandra-connector

[spark-cassandra-connector]How to convert scala implicit supported code to java in spark 2.3.1


I am trying to refactor spark-cassandra used project from scala_2.11 to java_1.8. I am using spark-sql_2.11-2..3.1 and spark-cassandra-connector_2.11-2.3.1. Now i am trying to refactor code into java 1.8.

I have few implicits and DataFrame been used.

import com.datastax.spark.connector._
import spark.implicits._
    &
import org.apache.spark.sql.DataFrame

Now how to covert them into equivalent java code ? any sample ?

import of DataFrame is not recognized/defined , working fine with scala 2.11 now it is not working ...

What am I doing wrong here ? How to fix it ?


Solution

  • There is no such thing as DataFrame in Java - it's always DataSet of class Row. In Scala, there is simple alias for that...

    Here is the minimal example of Java code that reads data from Cassandra via spark.sql:

    import org.apache.spark.sql.Dataset;
    import org.apache.spark.sql.Row;
    import org.apache.spark.sql.SparkSession;
    
    public class SparkTest1 {
    
      public static void main(String[] args) {
        SparkSession spark = SparkSession
        .builder()
        .appName("CassandraSpark")
        .getOrCreate();
    
        Dataset<Row> sqlDF = spark.sql("select * from datastax.vehicle limit 1000");
        sqlDF.printSchema();
        sqlDF.show();
      }
    
    }
    

    or it could be done via spark.read (full code):

    Dataset<Row> dataset = spark.read()
            .format("org.apache.spark.sql.cassandra")
            .options(ImmutableMap.of("table", "jtest", "keyspace", "test"))
            .load();