Search code examples
python-3.xpandaspytestpytest-mock

Get apply's function input dataframe with mocking


I have the following functions

def main():
    (
        pd.DataFrame({'a': [1, 2, float('NaN')], 'b': [1.0, 2, 3]})
        .dropna(subset=['a'])
        .assign(
            b=lambda x: x['b'] * 2
        )
        .apply(do_something_with_each_row, axis='columns')
    )

def do_something_with_each_row(one_row):
    # do_something_with_row
    print(one_row)

In my test, I want to look at the dataframe built after all chained operations and check if everything is fine with it before calling do_something_with_each_row. This last function does not return a dataframe (it just iterates over all rows similarly to iterrow).

I tried to mock the apply function like this:

# need pytest-mock and pytest
import pandas as pd


def test_not_working(mocker):
    mocked_apply = mocker.patch.object(pd.Dataframe, 'apply')
    main()

but in this case, I don't get the access to the dataframe which is input to apply to test its content.

I also tried to mock the do_something_with_each_row:

# need pytest-mock and pytest
import pandas as pd


def test_not_working_again(mocker):
    mocked_to_something = mocker.patch('path.to.file.do_something_with_each_row')
    main()

but this time I have all the calls with row arguments but they all have None values.

How could I get the dataframe for which apply function is called and check that it is indeed same as the following:

pd.Dataframe({'a': [1, 2], 'b': [2.0, 4]})

I am working with the 0.24.2 pandas version, an upgrade to pandas 1.0.5 does not change the matter.

I tried search in pandas issues but didn't find anything about this subject.


Solution

  • If I understood your question correctly this is one of the ways to get the behavior you want:

    def test_i_think_this_is_what_you_asked(mocker):
        original_apply = pd.DataFrame.apply
        def mocked_apply(self, *args, **kw):
            assert len(self) == 2 # self is the pd.DataFrame at the time apply is called
            assert self.a[0] == 1
            assert self.a[1] == 3 # this will fail cause the value is 2
            assert self.b[0] == 2.0
            assert self.b[1] == 4.0
            return original_apply(self, *args, **kw)
        mocker.patch.object(pd.DataFrame, 'apply', side_effect=mocked_apply, autospec=True)
        main()