I have following method to write into cassandra some time it saving data fine. When I run it again , sometimes it is throwing NullPointerException Not sure what is going wrong here ... Can you please help me.
'
@throws(classOf[IOException])
def writeDfToCassandra(o_model_family:DataFrame , keyspace:String, columnFamilyName: String) = {
logger.info(s"writeDfToCassandra")
o_model_family.write.format("org.apache.spark.sql.cassandra")
.options(Map( "table" -> columnFamilyName, "keyspace" -> keyspace ))
.mode(SaveMode.Append)
.save()
}
'
18/10/29 05:23:56 ERROR BMValsProcessor: java.lang.NullPointerException
at java.util.regex.Matcher.getTextLength(Matcher.java:1283)
at java.util.regex.Matcher.reset(Matcher.java:309)
at java.util.regex.Matcher.<init>(Matcher.java:229)
at java.util.regex.Pattern.matcher(Pattern.java:1093)
at scala.util.matching.Regex.findFirstIn(Regex.scala:388)
at org.apache.spark.util.Utils$$anonfun$redact$1$$anonfun$apply$15.apply(Utils.scala:2698)
at org.apache.spark.util.Utils$$anonfun$redact$1$$anonfun$apply$15.apply(Utils.scala:2698)
at scala.Option.orElse(Option.scala:289)
at org.apache.spark.util.Utils$$anonfun$redact$1.apply(Utils.scala:2698)
at org.apache.spark.util.Utils$$anonfun$redact$1.apply(Utils.scala:2696)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at org.apache.spark.util.Utils$.redact(Utils.scala:2696)
at org.apache.spark.util.Utils$.redact(Utils.scala:2663)
at org.apache.spark.sql.internal.SQLConf$$anonfun$redactOptions$1.apply(SQLConf.scala:1650)
at org.apache.spark.sql.internal.SQLConf$$anonfun$redactOptions$1.apply(SQLConf.scala:1650)
at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
at scala.collection.immutable.List.foldLeft(List.scala:84)
at org.apache.spark.sql.internal.SQLConf.redactOptions(SQLConf.scala:1650)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.simpleString(SaveIntoDataSourceCommand.scala:52)
at org.apache.spark.sql.catalyst.plans.QueryPlan.verboseString(QueryPlan.scala:178)
at org.apache.spark.sql.catalyst.trees.TreeNode.generateTreeString(TreeNode.scala:556)
at org.apache.spark.sql.catalyst.trees.TreeNode.treeString(TreeNode.scala:480)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$4.apply(QueryExecution.scala:198)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$4.apply(QueryExecution.scala:198)
at org.apache.spark.sql.execution.QueryExecution.stringOrError(QueryExecution.scala:100)
at org.apache.spark.sql.execution.QueryExecution.toString(QueryExecution.scala:198)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:654)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:273)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:267)
at com.snp.utils.DbUtils$.writeDfToCassandra(DbUtils.scala:47)
Oddly this is failing in the "redact" function of the Spark Utils. This is used on options that are potentially passed to Spark to remove sensitive data from UI's and such. I can't imagine why a null key-name would pop up in your SqlConf (since I believe you can only have Empty Strings) but I would check there. Could be a mutation of the conf while the method is being executed?