I am trying to refactor spark-cassandra used project from scala_2.11 to java_1.8. I am using spark-sql_2.11-2..3.1 and spark-cassandra-connector_2.11-2.3.1. Now i am trying to refactor code into java 1.8.
I have few implicits and DataFrame been used.
import com.datastax.spark.connector._
import spark.implicits._
&
import org.apache.spark.sql.DataFrame
Now how to covert them into equivalent java code ? any sample ?
import of DataFrame is not recognized/defined , working fine with scala 2.11 now it is not working ...
What am I doing wrong here ? How to fix it ?
There is no such thing as DataFrame
in Java - it's always DataSet
of class Row
. In Scala, there is simple alias for that...
Here is the minimal example of Java code that reads data from Cassandra via spark.sql
:
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
public class SparkTest1 {
public static void main(String[] args) {
SparkSession spark = SparkSession
.builder()
.appName("CassandraSpark")
.getOrCreate();
Dataset<Row> sqlDF = spark.sql("select * from datastax.vehicle limit 1000");
sqlDF.printSchema();
sqlDF.show();
}
}
or it could be done via spark.read
(full code):
Dataset<Row> dataset = spark.read()
.format("org.apache.spark.sql.cassandra")
.options(ImmutableMap.of("table", "jtest", "keyspace", "test"))
.load();