I have the following functions
def main():
(
pd.DataFrame({'a': [1, 2, float('NaN')], 'b': [1.0, 2, 3]})
.dropna(subset=['a'])
.assign(
b=lambda x: x['b'] * 2
)
.apply(do_something_with_each_row, axis='columns')
)
def do_something_with_each_row(one_row):
# do_something_with_row
print(one_row)
In my test, I want to look at the dataframe built after all chained operations and check if everything is fine with it before calling do_something_with_each_row
. This last function does not return a dataframe (it just iterates over all rows similarly to iterrow
).
I tried to mock the apply
function like this:
# need pytest-mock and pytest
import pandas as pd
def test_not_working(mocker):
mocked_apply = mocker.patch.object(pd.Dataframe, 'apply')
main()
but in this case, I don't get the access to the dataframe which is input to apply
to test its content.
I also tried to mock the do_something_with_each_row
:
# need pytest-mock and pytest
import pandas as pd
def test_not_working_again(mocker):
mocked_to_something = mocker.patch('path.to.file.do_something_with_each_row')
main()
but this time I have all the calls with row arguments but they all have None
values.
How could I get the dataframe for which apply
function is called and check that it is indeed same as the following:
pd.Dataframe({'a': [1, 2], 'b': [2.0, 4]})
I am working with the 0.24.2
pandas version, an upgrade to pandas 1.0.5
does not change the matter.
I tried search in pandas issues but didn't find anything about this subject.
If I understood your question correctly this is one of the ways to get the behavior you want:
def test_i_think_this_is_what_you_asked(mocker):
original_apply = pd.DataFrame.apply
def mocked_apply(self, *args, **kw):
assert len(self) == 2 # self is the pd.DataFrame at the time apply is called
assert self.a[0] == 1
assert self.a[1] == 3 # this will fail cause the value is 2
assert self.b[0] == 2.0
assert self.b[1] == 4.0
return original_apply(self, *args, **kw)
mocker.patch.object(pd.DataFrame, 'apply', side_effect=mocked_apply, autospec=True)
main()