Search code examples
scaladataframeapache-sparkapache-spark-sqlcase-class

Unable to create dataframe from a textfile using case class in spark scala


I have a dataset in textfile format I am trying to create a dataframe using case class but I am getting the below mentioned error:-

Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: The number of columns doesn't match. Old column names (1): value New column names (4): Name, Age, Department, Salary

This is the first three lines of my dataset:-

 Name,Age,Department,Salary
 Sohom,30,TD,9000000
 Aminul,32,AC,10000000

The code I am using is below:-

import org.apache.log4j.Logger
import org.apache.log4j.Level
import org.apache.spark.sql.SparkSession
case class Record(Name: String, Age :Int, Department: String, Salary: Int)
object airportDetails {

    def main(args: Array[String]): Unit = {
    val spark = SparkSession.builder().appName("Spark SQL val basic example").config("spark.master", "local").getOrCreate()
    spark.sparkContext.setLogLevel("OFF")
    Logger.getLogger("org").setLevel(Level.OFF)
    Logger.getLogger("akka").setLevel(Level.OFF)
    import spark.implicits._

    val input = spark.sparkContext.textFile("file:///C:/Users/USER/Desktop/SparkDocuments/airport_dataset.txt")
      .map(line => line.split(",").map(x => Record(x(0).toString,x(1).toInt,x(2).toString,x(3).toInt)))
    val input1 = input.toDF("Name", "Age", "Department", "Salary")

    input1.show()

    }
}

Solution

  • You can just use the Spark dataframe CSV reader and cast it to a dataset with Record type:

    case class Record(Name: String, Age: Int, Department: String, Salary: Int)
    
    val ds = spark.read.option("header",true)
                       .option("inferschema",true)
                       .csv("file:///C:/Users/USER/Desktop/SparkDocuments/airport_dataset.txt")
                       .as[Record]
    

    If you want a dataframe instead, you can use toDF:

    val df = ds.toDF("Name", "Age", "Department", "Salary")