Search code examples
scalaloopscsvapache-spark

Issue with iterating over DataFrame rows in Spark Scala and extracting values from a CSV file


I'm working on a project with Apache Spark in Scala and I'm facing an issue while trying to iterate over the rows of a DataFrame and extract values from columns of a CSV file.

Here's the code I'm using:

import org.apache.spark.sql.SparkSession

object ExampleDataFrameIteration {

  def main(args: Array[String]): Unit = {

    val spark = SparkSession.builder().appName("ExampleDataFrameIteration").getOrCreate()

    // Load the CSV file
    val df = spark.read.csv("path/to/file.csv")

    // Iterate over the rows
    df.foreach { row =>
      // Extract values from columns
      val number = row.getAs[String]("number")
      val account = row.getAs[String]("account")

      // Show the values
      println(s"number: $number, Account: $account")
    }

    spark.stop()
  }
}

However, when running the code, I'm not getting the expected output in the console. It seems like the println is not displaying anything, even though the CSV file has valid content.

I have already checked the path of the CSV file and I'm sure it's correct. Additionally, I confirmed that the CSV file has a header row with the column names 'number' and 'account'.

Could someone help me identify what might be causing this issue and how I can fix it?

Thanks in advance for any help or suggestions!


Solution

  • The job triggered by Spark is initiated from the driver and the actions run on the executor and the returns the result back to the driver due to the distributed nature of running the application.

    This means if you want to fetch the result in the form of a println then, you must use an action such as collect to bring the result back to the driver node.

    If you supplement the foreach action with collect as shown below, you should be able to see the data as expected:

    df.collect.foreach { row =>
          // Extract values from columns
          val number = row.getAs[String]("number")
          val account = row.getAs[String]("account")
    
          // Show the values
          println(s"number: $number, Account: $account")
        }