mocker.patch uses data from the previous parametrized run

I am new to pytest so I might use some pytest semantics incorrectly.

In general, I am having the following issue:

I am using mark.parametrize to do the mocking at a test, and when I use the same variable in an argument, mocking is using the data of the previous run instead of what I specify.

Analytically:

At the first 'iteration', in the mark.parametrize I am using mock_data_1 to mock the GetData.get_data(). Then, the test as I would expect mocks the data here: data = GetData.get_data() and afterwards it adds a new column to the data data['new_col0'].

At the second 'iteration', where in the mark.parametrize I am using again mock_data_1, instead of having a new fresh set of mock_data_1, the test uses the previous data, containing the extra column.

These are some sample files:

file.py

from test_file_get_data import GetData

class MyClass:
    def new_dataset(arg):
        data = GetData.get_data(arg)  # Mock this part
        data[f'new_col{arg}'] = arg  # New column to data
        return data

test_file.py

from file import MyClass
import pandas as pd
import pytest

class TestMyClass:
    mock_data_1 = pd.DataFrame({"col_1": [1,2,3]})
    arg_1 = 0
    arg_2 = 1
    output_1 = pd.DataFrame({"col_1": [1,2,3], "new_col0": [0,0,0]})
    output_2 = pd.DataFrame({"col_1": [1,2,3], "new_col1": [1,1,1]})

    @pytest.mark.parametrize(
        'mock_arguments, arg, result',
        [
            (mock_data_1, arg_1, output_1),
            (mock_data_1, arg_2, output_2)
        ]
    )
    def test_new_dataset(self, mocker, mock_arguments, arg, result):
       mocker.patch(
            'file.GetData.get_data',
            return_value=mock_arguments,
        )
       print(mock_arguments)
       res = MyClass.new_dataset(arg)
       print(res)
       assert res.to_dict() == result.to_dict()

test_file_get_data.py

import pandas as pd

class GetData:
    def get_data(arg):
        data = pd.DataFrame({"a":[1, 2, 3]})
        return data

So the first test passes, but the second one fails because the data returned is this:

{'col_1': {1, 2, 3},
 'new_col0': {0, 0, 0},
 'new_col1': {1, 1, 1}}

instead of this:

{'col_1': {1, 2, 3},
 'new_col1': {1, 1, 1}}

This issue can be solved if I replace data = GetData.get_data() with data = GetData.get_data().copy(), but I am assuming I am doing something wrong in the tests.

Shouldn't the data be refreshed and/or deleted after every iteration? Or what is happening is an expected behavior?

Solution

As mentioned in the comment, the problem is that a global variable (a class variable in this case, but that does not change the behavior) is used in the test, changed inside the test, and then the changed variable is used in the next test. There is nothing that tells pytest that the variable shall be reset - resetting variables is usually done in fixtures.

If, like in your example, the parameter does not change in the tests, you don't have to add it as a parameter at all. In this case you could use a fixture instead that is reset in each test:

class TestMyClass:

    arg_1 = 0
    arg_2 = 1
    output_1 = pd.DataFrame({"col_1": [1, 2, 3], "new_col0": [0, 0, 0]})
    output_2 = pd.DataFrame({"col_1": [1, 2, 3], "new_col1": [1, 1, 1]})

    @pytest.fixture
    def mock_arguments(self):
        return pd.DataFrame({"col_1": [1, 2, 3]})


    @pytest.mark.parametrize(
        'arg, result',
        [
            (arg_1, output_1),
            (arg_2, output_2)
        ]
    )
    def test_new_dataset(self, mocker, mock_arguments, arg, result):
        mocker.patch(
            'file.GetData.get_data',
            return_value=mock_arguments,
        )
        print(mock_arguments)
        ...

That is the standard way to handle variable reset in pytest.

If you want to use the argument in parametrize as you do (for example because the parameters is not for all tests the same), you cannot use a fixture, because the decorator is already read at load time. In this case you have to make sure that the original argument is reset or not changed in the first place yourself, for example by using a copy as you did - but in the test instead of the production code, which you don't want to change:

    @pytest.mark.parametrize(
        'mock_arguments, arg, result',
        [
            (mock_data_1, arg_1, output_1),
            (mock_data_1, arg_2, output_2)
        ]
    )
    def test_new_dataset(self, mocker, mock_arguments, arg, result):
        mocker.patch(
            'file.GetData.get_data',
            return_value=mock_arguments.copy(),  # use a copy of the value
        )
        print(mock_arguments)
        ...