Search code examples
javastringapache-sparkapache-spark-sqlapache-spark-dataset

How to convert the datasets of Spark Row into string?


I have written the code to access the Hive table using SparkSQL. Here is the code:

SparkSession spark = SparkSession
        .builder()
        .appName("Java Spark Hive Example")
        .master("local[*]")
        .config("hive.metastore.uris", "thrift://localhost:9083")
        .enableHiveSupport()
        .getOrCreate();
Dataset<Row> df =  spark.sql("select survey_response_value from health").toDF();
df.show();

I would like to know how I can convert the complete output to String or String array? As I am trying to work with another module where only I can pass String or String type Array values.
I have tried other methods like .toString or typecast to String values. But did not worked for me.
Kindly let me know how I can convert the DataSet values to String?


Solution

  • Here is the sample code in Java.

    public class SparkSample {
        public static void main(String[] args) {
            SparkSession spark = SparkSession
                .builder()
                .appName("SparkSample")
                .master("local[*]")
                .getOrCreate();
        //create df
        List<String> myList = Arrays.asList("one", "two", "three", "four", "five");
        Dataset<Row> df = spark.createDataset(myList, Encoders.STRING()).toDF();
        df.show();
        //using df.as
        List<String> listOne = df.as(Encoders.STRING()).collectAsList();
        System.out.println(listOne);
        //using df.map
        List<String> listTwo = df.map(row -> row.mkString(), Encoders.STRING()).collectAsList();
        System.out.println(listTwo);
      }
    }
    

    "row" is java 8 lambda parameter. Please check developer.com/java/start-using-java-lambda-expressions.html