Search code examples
pythonpandaspytestfixturesparametrized-testing

pytest: use fixture with pandas dataframe for parametrization


I have a fixture, which returns a pd.DataFrame. I need to insert the individual columns (pd.Series) into a unit test and I would like to use parametrize.

Here's a toy example without parametrize. Every column of the dataframe will be tested individually. However, I guess I can get rid of the input_series fixture, can't I? With this code, only 1 test will be executed. However, I am looking for 3 tests while getting rid of the for-loop at the same time.

import numpy as np
import pandas as pd
import pytest


@pytest.fixture(scope="module")
def input_df():
    return pd.DataFrame(
        data=np.random.randint(1, 10, (5, 3)), columns=["col1", "col2", "col3"]
    )


@pytest.fixture(scope="module")
def input_series(input_df):
    return [input_df[series] for series in input_df.columns]


def test_individual_column(input_series):
    for series in input_series:
        assert len(series) == 5

I am basically looking for something like this:

@pytest.mark.parametrize("series", individual_series_from_input_df)
def test_individual_column(series):
    assert len(series) == 5

Solution

  • If you try to generate multiple data from a fixture based on another fixture you will get the yield_fixture function has more than one 'yield' error message.

    One solution is to use fixture parametrization. In your case you want to iterate by columns so the Dataframe columns are the parameters.

    # test data
    input_df = pd.DataFrame(
        data=np.random.randint(1, 10, (5, 3)), columns=["col1", "col2", "col3"]
    )
    
    
    @pytest.fixture(
        scope="module",
        params=input_df.columns,
    )
    def input_series(request):
        series = request.param
        yield input_df[series]
    
    
    def test_individual_column(input_series):
        assert len(input_series) == 5
    

    This will generate one test by column of the test Dataframe.

    pytest test_pandas.py
    # test_pandas.py::test_individual_column[col1] PASSED
    # test_pandas.py::test_individual_column[col2] PASSED
    # test_pandas.py::test_individual_column[col3] PASSED