Search code examples
pythonmockingpytesthuggingfacehuggingface-datasets

Why mocking HuggingFace datasets library does not work?


I have a Python function that uses the HuggingFace datasets library to load a private dataset from HuggingFace Hub.

I want to write a unit test for that function, but it seems pytest-mock does not work for some reason. The real function keeps getting called, even if the mock structure should be correct.

This is the main function:

def load_data(token: str):
    dataset = load_dataset("MYORG/MYDATASET", use_auth_token=token, split="train")
    return dataset

And this is the test function I wrote:

def test_data(mocker):
    # Mocked data
    token_test = "test_token"
    mocked_dataset = [
        {'image': [[0.5, 0.3], [0.7, 0.9]], 'timestamp': datetime.date(2023, 1, 1)},
    ]
    mocker.patch('datasets.load_dataset', return_value=mocked_dataset)

    result = load_data(token_test)

    assert len(result) == 1

Could it be that there are some "unmockable" libraries which do stuff under the hood and make their functions impossible to stub?


Solution

  • The official Python documentation has this part: where-to-patch.

    If your module is called my_module, and it does from datasets import load_dataset then you should patch mocker.patch('my_module.load_dataset' so that your module is using the mock.

    Patching datasets.load_dataset might be too late, since if the import in your module happened before that instruction, it has no effect.