Search code examples
excelvbarubberduck

Test Module VBA


i'm trying to write a test module to test one of the modules I wrote in VBA. In specific, I have a if statement I would like to trigger using the test module by giving the module/funtion the wrong initial parameters. The module/function I would like to test is:

Function TPR_TNR_FPR_FNR(expected_vals As Range, pred_vals As Range, 
val_tested As Integer) As Double

If WorksheetFunction.CountA(expected_vals) <> 
WorksheetFunction.CountA(pred_vals) Then
   MsgBox "Cells in Expected_vals and pred_vals must be the same in length"
   Stop
End If

count_all = 0
For Each cell In expected_vals
  If cell = val_tested Then
    count_all = count_all + 1
  End If
Next cell

count_correct = 0
For i = 1 To expected_vals.Cells.Count
  If (expected_vals.Cells(i).Value = pred_vals.Cells(i).Value) And 
(expected_vals.Cells(i).Value = val_tested) Then
     count_correct = count_correct + 1
  End If
Next

TPR_TNR_FPR_FNR = count_correct / count_all

End Function

And my test module is:

 '@TestModule
 Private Assert As Rubberduck.AssertClass

 '@TestMethod
 Public Sub Test1()
 'Arrange
 Const expected As String = "Cells in Expected_vals and pred_vals must be 
 the same in length"
 Dim actual As String

 'Act
 Dim r1, r2 As Variant
    r1 = 
 WorksheetFunction.Transpose(Application.ActiveSheet.Range("A1:A5").Select)
    r2 = 
 WorksheetFunction.Transpose(Application.ActiveSheet.Range("B1:B4").Select)
 actual = Module1.TPR_TNR_FPR_FNR(r1, r2, 0)

 'Assert
 Assert.AreEqual expected, actual, "Expected MsgBox not received"
 End Sub

However I get the error "Byref argument type mismatch" for the r1 variant when the test script gets to "actual=...". Please assist me, I don't know what I'm doing wrong. I have successfully installed Rubberduck already.


Solution

  • First of all, kudos for testing your VBA code. Professional developers in every language write unit tests, and with Rubberduck (disclaimer: I manage that project) you're stepping up your game and contributing to make VBA less of a dreaded language.

    Not all code is testable though. In order to write unit tests against a function, that function needs to be written in such a way that coupling is reduced to a minimum, and its dependencies are ideally taken in as parameters.

    The One Thing that definitely makes a function untestable, is when that function involves user interaction. MsgBox pops a modal window that needs to be dismissed manually, so testable code avoids it1. Stop is debugger code that shouldn't be in production, and prevents execution of a test as well.


    You're hit by a bus, or move on to pursue new challenges elsewhere, and someone now needs to take over that code tomorrow. Will they curse your name, or praise your work?

    I can't read TPR_TNR_FPR_FNR and immediately figure out what it does just by its name. That's a problem, because it makes maintenance much harder than it needs to be: if we don't know what a function is supposed to be doing, how do we know it's doing it right? With a suite of well-named tests, we can know how it behaves in all cases... assuming well-named tests. Test1 doesn't tell us much, beyond well it's testing something.

    First ditch the MsgBox and Stop statement - throw an error in that guard clause instead:

    If WorksheetFunction.CountA(expected_vals) <> WorksheetFunction.CountA(pred_vals) Then
        Err.Raise 5, "TPR_TNR_FPR_FNR", "Cells in Expected_vals and pred_vals must be the same in length"
    End If
    

    Note that this doesn't compare the number of rows and/or columns of each range; only that they have the same number of non-empty cells. Just with that one Err.Raise statement, I can think of several unit tests to write:

    • Given same-size ranges with the same number of non-empty cells, no error is thrown.
    • Given same-size ranges with different number of non-empty cells, error 5 is thrown.
    • Given different-size ranges with same number of non-empty cells, no error is thrown.
    • Given different-size ranges with different number of non-empty cells, error 5 is thrown.
    • Given non-adjacent ranges with the same number of non-empty cells, no error is thrown.
    • Given two ranges without any non-empty cells, no error is thrown.

    If any of these statements doesn't look right, then your code isn't working as intended - because all these tests would pass, given the error is thrown when WorksheetFunction.CountA returns a different value for the two ranges.

    Passed the guard clause, the function proceeds to iterate the cells in expected_vals what have a value matching the val_tested parameter.

    The function is working with Range objects, iterating cells, implicitly comparing Range.[_Default] (Value) against an Integer value: if any of the cells in expected_vals contains an error, a Type Mismatch error is thrown here:

    If cell = val_tested Then
    

    Because the above is really doing this:

    If cell.Value = val_tested Then
    

    Range.Value is a Variant that can hold any value: numeric values are Variant/Double, so even in the "happy path" there's an implicit conversion going on, in order to compare that Double with the provided Integer. Looks like val_tested should be a Double.

    But Range.Value can also be Variant/Error, and that variant subtype can't be compared to any other type without throwing a type mismatch. If throwing that type mismatch is expected, there should be a test for it. Otherwise, it should be handled - and then there should be a test for it:

    • Given an error value in expected_vals, throws error 13 (or not?)

    If that error shouldn't be happening, then the function needs to actively prevent it:

    For Each cell In expected_vals
        If Not IsError(cell.Value) Then
            If cell.Value = val_tested Then count_all = count_all + 1
        End If
    Next
    

    So count_all is really the number of cells in expected_vals that have a value that matches the supplied val_tested parameter: I believe matchingExpectedValuesCount would be a more descriptive/meaningful name for it, and it should be declared locally with a Dim statement (Rubberduck inspections should be warning you about it.. and a couple other things).

    Next we have a For loop that makes a surprising assumption:

    For i = 1 To expected_vals.Cells.Count
        If (expected_vals.Cells(i).Value = pred_vals.Cells(i).Value) And (expected_vals.Cells(i).Value = val_tested) Then
    

    We're now assuming a very specific shape for the supplied ranges. If we made it this far with a 2-column range, or a non-contiguous multiple-area range, this is where we're going to blow up.

    The guard clause needs to guard against that assumption, and throw an error accordingly. WorksheetFunction.CountA / the number of non-empty cells in each provided range, isn't enough to properly guard against bad inputs. Something like this should be more accurate:

    If expected_vals.Rows.Count <> pred_vals.Rows.Count _
        Or expected_vals.Columns.Count <> 1 _
        Or pred_vals.Columns.Count <> 1 _
    Then
        Err.Raise 5, "TPR_TNR_FPR_FNR", "Invalid inputs"
    End If
    

    Now the assumptions would be:

    • Given same-size ranges with the same number of cells, no error is thrown.
    • Given same-size ranges with different number of cells, error 5 is thrown.
    • Given different-size ranges with same number of cells, error 5 is thrown.
    • Given different-size ranges with different number of cells, error 5 is thrown.
    • Given non-adjacent ranges with the same number of non-empty cells, error 5 is thrown.
    • Given two ranges without any non-empty cells, no error is thrown.

    Now with that settled, the 2nd loop must also handle Variant/Error to prevent Type Mismatch errors.

    If Not IsError(expected_vals.Cells(i).Value) _
        And Not IsError(pred_vals.Cells(i).Value) _
    Then
        If (expected_vals.Cells(i).Value = pred_vals.Cells(i).Value) And (expected_vals.Cells(i).Value = val_tested) Then
            count_correct = count_correct + 1
        End If
    End If
    

    Lastly, the assignment of result of the function is going to throw a division by zero error if count_all is 0:

    TPR_TNR_FPR_FNR = count_correct / count_all
    

    If that's expected, there should be a test for it. Otherwise, it should be guarded against, a surrogate value should be returned (e.g. -1, or 0), ...and there should be a test for it!

    • Given no cells in expected_vals match the supplied val_tested value, error 11 is thrown.

    Or..

    • Given no cells in expected_vals match the supplied val_tested value, returns 0.

    Writing the tests

    For every single "Given..., ..." bullet above, a test should be written to prove it. Your test has a number of already-identified issues, and a number of unidentified ones, too.

    The secret sauce to writing good tests, is controlling the inputs. Having Excel.Range parameters is making it harder than necessary: now you need to have some test sheet with an actual test range with a bunch of test values, ...and it's a nightmare, because now whether the tests pass or fail depends on things that aren't in the tests themselves - and that's very bad: good tests should have reliable, reproducible, consistent results.

    I haven't seen anything in that function that says it needs to work with Range parameters. In fact, working with plain arrays would make it significantly more efficient, and much easier to assert the assumptions in the guard clause - just check the array bounds! Working with plain arrays also means the tests can now be self-contained: the test setup code can easily define test arrays to provide the function with, especially since we've established that these arrays need to be 1-dimensional.

    So the function needs to be rewritten to work with Variant arrays instead.

    Once that's done (I'll leave that part to you!), you can easily setup all required inputs for all tests, and Rubberduck's test templates make that fairly easy. Here's what one of these tests could look like:

    '@TestMethod
    Public Sub GivenDifferentSizeArrays_Throws()
        Const ExpectedError As Long = 5
        On Error GoTo TestFail
    
        'Arrange:
        Dim expectedValues As Variant
        expectedValues = Array(1, 2, 3)
    
        Dim predValues As Variant
        predValues = Array(1, 2, 3, 4)
    
        'Act:
        Dim result As Double
        result = TPR_TNR_FPR_FNR(expectedValues, predValues, 1)
    
    Assert:
        Assert.Fail "Expected error was not raised."
    
    TestExit:
        Exit Sub
    TestFail:
        If Err.Number = ExpectedError Then
            Resume TestExit
        Else
            Resume Assert
        End If
    End Sub
    

    This test (note that it requires the function to be modified to take two variant arrays, not Range parameters) expects error 5 to be raised by the function call, given two differently-sized arrays: if the expected error isn't raised, the test fails. If it is, the test passes.

    Another test could validate that error 13 is thrown given an error value in one of the cells - here an #N/A cell error value:

        'Arrange:
        Dim expectedValues As Variant
        expectedValues = Array(1, 2, 3)
    
        Dim predValues As Variant
        predValues = Array(CVErr(xlErrNA), 2, 3)
    

    And so on, until all thinkable edge cases are covered: if your tests are all meaningfully named, you can know exactly how your function is expected to behave, by simply reading the names of the tests in Rubberduck's test explorer, and with a single click run the whole suite, seeing them all turn green, proving that the function works exactly as intended - even after you made changes to it.


    Making assumptions explicit

    Here's a rewritten version of your function, that makes its assumptions explicit and should be much easier to write tests against:

    Public Function TPR_TNR_FPR_FNR(ByRef expected_vals As Variant, ByRef pred_vals As Variant, ByVal val_tested As Double) As Double
    
        Dim workValues As Variant
        Dim predValues As Variant
    
        If Not IsArray(expected_vals) Or Not IsArray(pred_vals) Then
            Err.Raise 5, "TPR_TNR_FPR_FNR", "Parameters must be arrays."
        Else
            workValues = expected_vals
            predValues = pred_vals
        End If
    
        If TypeOf expected_vals Is Excel.Range Then
            If expected_vals.Columns.Count <> 1 Then Err.Raise 5, "TPR_TNR_FPR_FNR", "'expected_vals' must be a single column."
            workValues = Application.WorksheetFunction.Transpose(expected_vals)
        End If
    
        If TypeOf pred_vals Is Excel.Range Then
            If pred_vals.Columns.Count <> 1 Then Err.Raise 5, "TPR_TNR_FPR_FNR", "'pred_vals' must be a single column."
            predValues = Application.WorksheetFunction.Transpose(pred_vals)
        End If
    
        If UBound(workValues) <> UBound(predValues) Then
            Err.Raise 5, "TPR_TNR_FPR_FNR", "'expected_vals' and 'pred_vals' must be the same size."
        End If
    
        Dim matchingExpectedValuesCount As Long
        Dim currentIndex As Long
        For currentIndex = LBound(workValues) To UBound(workValues)
            If workValues(currentIndex) = val_tested Then
                matchingExpectedValuesCount = matchingExpectedValuesCount + 1
            End If
        Next
    
        If matchingExpectedValuesCount = 0 Then
            TPR_TNR_FPR_FNR = 0
            Exit Function
        End If
    
        Dim count_correct As Long
        For currentIndex = LBound(predValues) To UBound(predValues)
            If workValues(currentIndex) = predValues(currentIndex) And workValues(currentIndex) = val_tested Then
                count_correct = count_correct + 1
            End If
        Next
    
        TPR_TNR_FPR_FNR = count_correct / matchingExpectedValuesCount
    
    End Function
    

    Note that I'm not 100% clear on the purpose of everything, so I've left a number of identifiers as you have them - I'd warmly recommend renaming them though.


    1 Rubberduck's unit testing features include a "fakes" API that lets you configure a test and literally hijack MsgBox (and several others) calls, allowing you to write a test for a procedure that normally pops a message box, without ever displaying it while the test is running. The API also lets you configure its return value, so you can e.g. test what happens when the user clicks "Yes", and then another test can confirm what happens when the user clicks "No".