I'm working on a project with Apache Spark in Scala and I'm facing an issue while trying to iterate over the rows of a DataFrame and extract values from columns of a CSV file.
Here's the code I'm using:
import org.apache.spark.sql.SparkSession
object ExampleDataFrameIteration {
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder().appName("ExampleDataFrameIteration").getOrCreate()
// Load the CSV file
val df = spark.read.csv("path/to/file.csv")
// Iterate over the rows
df.foreach { row =>
// Extract values from columns
val number = row.getAs[String]("number")
val account = row.getAs[String]("account")
// Show the values
println(s"number: $number, Account: $account")
}
spark.stop()
}
}
However, when running the code, I'm not getting the expected output in the console. It seems like the println is not displaying anything, even though the CSV file has valid content.
I have already checked the path of the CSV file and I'm sure it's correct. Additionally, I confirmed that the CSV file has a header row with the column names 'number' and 'account'.
Could someone help me identify what might be causing this issue and how I can fix it?
Thanks in advance for any help or suggestions!
The job triggered by Spark is initiated from the driver and the actions run on the executor and the returns the result back to the driver due to the distributed nature of running the application.
This means if you want to fetch the result in the form of a println
then, you must use an action such as collect
to bring the result back to the driver node.
If you supplement the foreach
action with collect
as shown below, you should be able to see the data as expected:
df.collect.foreach { row =>
// Extract values from columns
val number = row.getAs[String]("number")
val account = row.getAs[String]("account")
// Show the values
println(s"number: $number, Account: $account")
}