Search code examples
pythonmachine-learningdata-sciencepytestfixtures

Pytest tmpdir_factory threw an error "Expected binary or unicode string, got local"


I'm using pytest to do testing on splitting data into train, val, test set for machine learning problem. I create temporary files using tmpdir_factory, but it threw me an error something like TypeError: Expected binary or unicode string, got local('/tmp/pytest/pytest-4/test_folder0/train.tfrecord'). Here is my code:

Inside conftest.py:

DATA_FOLDER = 'test_folder'

@pytest.fixture(scope="session")
def train_dataset(tmpdir_factory):
    return tmpdir_factory.mktemp(DATA_FOLDER).join('train.tfrecord')

@pytest.fixture(scope="session")
def val_dataset(tmpdir_factory):
    return tmpdir_factory.mktemp(DATA_FOLDER).join('val.tfrecord')

@pytest.fixture(scope="session")
def test_dataset(tmpdir_factory):
    return tmpdir_factory.mktemp(DATA_FOLDER).join('test.tfrecord')

Inside the test file:

def test_split(train_dataset, val_dataset, test_dataset):
    # the arguments of split_function refer to the path where the splitting results is written
    split_function(train_dataset, val_dataset, test_dataset)
    """continue with assert functions"""

Can anyone please help? Thanks


Solution

  • The tmpdir_factory fixture methods return a py.path.local object, that encapsulates a path (a bit simular to pathlib.Path). These method calls can therefore be chained to manipulate paths, as is done in your fixtures using mktemp().join(). To get back a str path from the result, you have to explicitely convert the py.path.local to str:

    @pytest.fixture(scope="session")
    def train_dataset(tmpdir_factory):
        return str(tmpdir_factory.mktemp(DATA_FOLDER).join('train.tfrecord'))
    

    As your tested functions don't know about py.path.local, converting paths created by tmpdir_factory back to str is generally the way to use this fixture.