I have a scala/spark program that is used to validate xmls file in an input directory and then writes the report to another input parameter (local filesystem path to write report to).
As per the requirements from stakeholders this program is to run on local machines hence I am using spark in local mode. Till now things were fine, i was using the code below to save my report to a file
dataframe.repartition(1)
.write
.option("header", "true")
.mode("overwrite")
.csv(reportPath)
However this required winutils to be installed/configured on the machines running my program.
Given we use cloudera updates very often, there was an overhead of changing winutils after evry update as we would be updating the jars to the latest version in our pom file. Hence, I have been asked to remove the dependency on winutils
On a quick google search and after coming across How to save Spark RDD to local filesystem I decided to change the above pice of code to
val outputRdd = dataframe.rdd
val count = outputRdd.count()
println("\nCount is: " + count + "\n")
println("\nOutput path is: " + reportPath + "\n")
outputRdd.coalesce(1).saveAsTextFile(reportPath)
However, on running the code I am now getting this error
Count is: 15
Output path is: C:\\codingdir\\test\\report
Exception in thread "main" java.lang.IllegalAccessError: tried to access method org.apache.hadoop.mapred.JobContextImpl.<init>(Lorg/apache/hadoop/mapred/JobConf;Lorg/apache/hadoop/mapreduce/JobID;)V from class org.apache.spark.internal.io.HadoopMapRedWriteConfigUtil
at org.apache.spark.internal.io.HadoopMapRedWriteConfigUtil.createJobContext(SparkHadoopWriter.scala:178)
at org.apache.spark.internal.io.SparkHadoopWriter$.write(SparkHadoopWriter.scala:67)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1096)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1094)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1094)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:1094)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply$mcV$sp(PairRDDFunctions.scala:1067)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:1032)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:1032)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:1032)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply$mcV$sp(PairRDDFunctions.scala:958)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply(PairRDDFunctions.scala:958)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply(PairRDDFunctions.scala:958)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:957)
at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply$mcV$sp(RDD.scala:1499)
at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1478)
at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1478)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1478)
at com.optus.dcoe.hawk.XmlParser$.delayedEndpoint$com$optus$dcoe$hawk$XmlParser$1(XmlParser.scala:120)
at com.optus.dcoe.hawk.XmlParser$delayedInit$body.apply(XmlParser.scala:16)
at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.App$class.main(App.scala:76)
at com.optus.dcoe.hawk.XmlParser$.main(XmlParser.scala:16)
at com.optus.dcoe.hawk.XmlParser.main(XmlParser.scala)
I have tried changing the value of reportPath varible to C:\codingdir\test\report file://C:/codingdir/test/report file://C:/codingdir/test/report
and other values as suggested on
and other links but I am still getting same error
I have found these articles about java.lang.IllegalAccessError but not sure how do i get around this error:
Can someone please help me in resolving this?
Env variable HADOOP_HOME pertaining to winutls has been removed. winutils entry has been removed from PATH variable I am using java 8 on windows 10 (all the users of the program would be on similar laptops) Spark version is 2.4.0-cdh6.2.1
Finally found out the issue, It was caused by some unwanted mapreduce related dependencies which have now been removed and I have moved to another error now