I have a Python function that uses the HuggingFace datasets
library to load a private dataset from HuggingFace Hub.
I want to write a unit test for that function, but it seems pytest-mock does not work for some reason. The real function keeps getting called, even if the mock structure should be correct.
This is the main function:
def load_data(token: str):
dataset = load_dataset("MYORG/MYDATASET", use_auth_token=token, split="train")
return dataset
And this is the test function I wrote:
def test_data(mocker):
# Mocked data
token_test = "test_token"
mocked_dataset = [
{'image': [[0.5, 0.3], [0.7, 0.9]], 'timestamp': datetime.date(2023, 1, 1)},
]
mocker.patch('datasets.load_dataset', return_value=mocked_dataset)
result = load_data(token_test)
assert len(result) == 1
Could it be that there are some "unmockable" libraries which do stuff under the hood and make their functions impossible to stub?
The official Python documentation has this part: where-to-patch.
If your module is called my_module
, and it does from datasets import load_dataset
then you should patch mocker.patch('my_module.load_dataset'
so that your module is using the mock.
Patching datasets.load_dataset
might be too late, since if the import in your module happened before that instruction, it has no effect.