I'm attempting the Kaggle Titanic Example using SparkML and Scala. I'm attempting to load the first training file but I am running into a strange error:
java.io.IOException: Could not read footer: java.lang.RuntimeException: file:/Users/jake/Development/titanicExample/src/main/resources/data/titanic/train.csv is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [44, 81, 13, 10]
The file is a .csv
so I'm not sure why its expecting a Parquet file.
Here is my code:
object App {
val spark = SparkSession
.builder()
.master("local[*]")
.appName("liveOrDie")
.getOrCreate()
def main(args: Array[String]) {
val rawTrainingData = spark.read
.option("header", "true")
.option("delimiter", ",")
.option("inferSchema", "true")
.load("src/main/resources/data/titanic/train.csv")
// rawTrainingData.show()
}
}
I seem to have had a conflict with Scala versions in my pom.xml
NOT my original code. My pom.xml
had multiple Scala versions seemingly causing issues. I updated all dependencies that used Scala to the same version using a dynamic property <scala.dep.version>2.11</scala.dep.version>
and that fixed the problem.