Search code examples
scala

Converting a List of Arrays in Scala into a Dataframe?


I'm new to Scala and I'm reading some CSV data from a URL without actually saving into a CSV file. I'm storing that data into a List[Array[String]]:

The result is a DF with a single column named "value" and each Array in the list becoming a row of that column, I'm attempting to create a 15 column DF because each array has a length of 15. Any advice for this?

    var stockURL: URL = null
    val spark: SparkSession = SparkSession.builder.master("local").getOrCreate
    import spark.implicits._
    val sc = spark.sparkContext
    try {
      stockURL = new URL("someurlimreadingfrom.com/asdf")
      val in: BufferedReader = new BufferedReader(new InputStreamReader(stockURL.openStream))
      val reader: CSVReader = new CSVReader(in)
      val allRows: List[Array[String]] = reader.readAll.asScala.toList
      val allRowsDF = sc.parallelize(allRows).toDF()
      allRowsDF.show
    } catch {
      case e: MalformedURLException =>
        e.printStackTrace()
      case e: IOException =>
        e.printStackTrace()
    }

I had to hide the URL and resulting DF due to sensitivity of the data, I apologize


Solution

  • i have done a piece of code if i understand well your question:

    it's working for a Array of length 3, you can easily extend it to 15.

    val allRows: List[Array[String]] =
      List(Array("a", "b", "c"), Array("a", "b", "c"))
    val df1 = spark.sparkContext.parallelize(allRows).toDF()
    
    df1
      .withColumn("col0", $"value".getItem(0))
      .withColumn("col1", $"value".getItem(1)).show()