Search code examples
scalaapache-sparkscalamock

How to use ScalaMock to evaluate that function was called with certain Spark Dataframe parameter and have useful output


i've been looking at:

but not quite got the result I want yet essentially I had this test

 scenario("myFunction reads parquet and writes to db") {
          var mockUtil: UtilitiesService = stub[UtilitiesService]
          val service = new myService(mockUtil)
          
          val expectedParquetDf = Seq(
              (999, "testData")
          ).toDF("number", "word")

          (mockUtil.getDataFrameFromParquet _).when("myParquetPath") returns Right(expectedParquetDf)
          service.publishToDatabase()
          (mockUtil.insertDataFrameIntoDb_).verify(expectedParquetDf,"myTable").once()        
      }

But if that test fails (due to a dataframe mismatch) the output isn't ideal, simply saying that

[info]   Expected:
[info]   inAnyOrder {
[info]     <stub-4> UtilitiesService.getDataFrameFromParquet(path) any number of times (called once)
[info]     <stub-4> UtilitiesService.insertDataFrameIntoPostgres[number: int, word: string] once (never called - UNSATISFIED)
[info]   }
[info]   
[info]   Actual:
[info]     <stub-4> UtilitiesService.getDataFrameFromParquet(oath)
[info]     <stub-4> UtilitiesService.insertDataFrameIntoPostgres([number: int, word: string], "myTable" (myFile.scala:28)

The string part is spot on, but the dataframe part; is only useful if say a column is dropped, less so if there is a bad row etc. is there a nice way of improving this?

Currently my rabbit hole has lead me to the below, which still doesn't work and the "assert" functions that return true to make the "&&" part work all feels like there must be a better way. Is there some comparer function I can override in the standard verify?? :

  def assertStringsAreEqual(expectedPath:String, actualPath:String) : Boolean = {
          assert(actualPath == expectedPath)
          true
      }

      def assertDataFramesAreEqual(expected: DataFrame, actual: DataFrame) : Boolean = {
          AssertHelpers.assertDataEqual(expected, actual) //verbos info, asserts on each row etc
          true
      }


      scenario("myFunction reads parquet and writes to db") {
          var mockUtil: UtilitiesService = stub[UtilitiesService]
          val service = new myService(mockUtil)
          val expectedParquetDf = Seq(
              (999, "testData"),
              (898, "wrongData"),
              (999, "extraRow")
          ).toDF("number", "word")

          val incorrectExample = Seq(
              (999, "testData"),
              (999, "testData")
          ).toDF("number", "word")

          (mockUtil.getDataFrameFromParquet _).when("myParquetPath") returns Right(incorrectExample) //forced to incorrect for now
          (_mockUtilService.insertDataFrameIntoPostgres _).
              expects(where { {
                      (actualDf, path) => assertDataFramesAreEqual(expectedParquetDf, actualDf) && assertStringsAreEqual(path, "ExpectedTable")
              }  })
              .once()

          service.publishToDb()

      }

For reference I'm aiming for something like this to pop up somewhere:

Expected:
Dataframe:
[number, word]
[999, "testData"]
[898, "wrongData"]
[999, "extraRow"]

Actual:
Dataframe
[number, word]
[999, "testData"]
[999, "testData"]

Solution

  • So this is still not ideal, but using "expects.onCall" I can get the output I want

    scenario("myFunction reads parquet and writes to db") {
          var mockUtil: UtilitiesService = mock[UtilitiesService]
          val service = new myService(mockUtil)
          val expectedParquetDf = Seq(
              (999, "testData"),
              (898, "wrongData"),
              (999, "extraRow")
          ).toDF("number", "word")
    
          val incorrectExample = Seq(
              (999, "testData"),
              (999, "testData")
          ).toDF("number", "word")
    
         //set up expectations
          (mockUtilService.insertDataFrameIntoPostgres _).expects(*,"ExpectedTable").onCall( { (df: DataFrame, path:String) =>
            AssertHelpers.assertDataEqual(df, expectedParquetDf)
            Right(sxDbData)
            })
    
    
          (mockUtil.getDataFrameFromParquet _).when("myParquetPath") returns Right(incorrectExample) //forced to incorrect for now
    
          service.publishToDb()
    
      }
    

    Hopefully someone has a cleaner solution for this