Search code examples
scalaunit-testingapache-sparkexceptioncoding-style

Scala Test: how to assert lenghty exception message securly and clean without hardcoding?


I have the following code, which is used to (sha) hash columns in a spark dataframe:

import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.functions.{sha2,lit, col}

object hashing {

def process(hashFieldNames: List[String])(df: DataFrame) = {
   hashFieldNames.foldLeft(df) { case (df, hashField) =>
   df.withColumn(hashField, sha2(col(hashField), 256))
  }
 }
}

Now in a seperate file, I am testing my hashing.process using a AnyWordSpec Test as follows:

"The hashing .process " should {
// some cases here that complete succesfully 
"fail to hash a spark dataframe due to type mismatch " in {
  val goodColumns = Seq("language", "usersCount", "ID", "personalData")
  val badDataSample =
    Seq(
      ("Java", "20000", 2, "happy"),
      ("Python", "100000", 3, "happy"),
      ("Scala", "3000", 1, "jolly")
    )
  
  val badDf =
    spark.sparkContext.parallelize(badDataSample).toDF(goodColumns: _*)

  val thrown = intercept[org.apache.spark.sql.AnalysisException] {
    val hashedResultDf =
      hashing.process(hashFieldNames)(badDf) 
      
  }
  assert (thrown.getMessage === // some lengthy error message that I do not want to copy paste in its entirety. 

Usually, as I understand, one would want to hard code the whole error message to ensure that it is indeed as we expect. However, the message is very lengthy and I am wondering if there is no better approach.

Basically, I have two questions:

a.) Is it considered good practice to match only the beginning part of error message and then follow up with a regex ? I am thinking something like this: thrown.getMessage === "[cannot resolve sha2(ID, 256) due to data type mismatch: argument 1 requires binary type, however, ID is of int type.;" + regexpattern \;(.*))

b.) If a.) is considered a hacky approach, do you have any working suggestion on how to do it properly ?

Note: Small errors possible with code above, I adapted it for SO post. But you should get the idea.


Solution

  • You should not be asserting exception messages (unless they are surfced to the user, or something downndstream relies on them). If throwing an exception is a part of contract, then you should be throwing one of a specific type with a given error code, and tests should be asserting that. And if it isn't, then who cares what the message said?