Search code examples
pythonpytestmonkeypatching

Monkeypatch Extract step in ETL data pipeline for functional testing


Consider an ETL pipelines repo build like that:

etl_repo  
├── app  
     ├── extract  
         ├── extr_a.py  
         ├── extr_b.py   
     ├── transform  
         ├── trans_a.py  
         ├── trans_b.py  
     ├── load  
         ├── load_a.py  
         ├── load_b.py  
     ├── config.py  
     ├── my_job1.py 
├── tests  
     ├── test_my_job1.py   

I am running in a production server python app/my_job1.py on a periodic basis. The job(s) are importing functions from the different ETL models stored in the repo (extract, transform and load). I have unit tests coverage for the ETL models but I would like functional (end to end) testing for the actual job(s).

I learned about monkeypatch with pytest to load static data instead of relying on my extract network ressources. It is working as expected.

However I cannot figure out what would be the best way to monkeypatch my extract models and make the test execute the python app/my_job1.py command, as if it was in production.

I would like to avoid having to copy the full job into another test function with monkeypatch fixture. Although technically working, it would be painful to modify both the job and its test each and every time.

The functional test has to be as close as possible to what the production system is doing.
I tried to use subprocess to create a child process from inside the test method but the child process itself is not inheriting from the monkeypatched imports.
I would like to avoid having to inject test code/import in my_job1.py, within conditions like if Config.ETL_ENV == "TEST", just to keep my code clean between code and tests.

UPDATE:

I finally encapsulated my_job1.py execution into a function and calling it within if __name__ == "__main__":.
Execution is still called when using the production command python app/my_job1.py but now I can import the my_job1.py inside the pytest file and exec the function later on (after applying my monkeypatch)

My issue remains though: when executing the my_job1 main function, the extract is not patched!

Here is an example pytest file following my previous repo structure:

from app.extract import extr_a
from app import my_job1


@pytest.fixture
def mock_getdata_extra_a(monkeypatch):
    def mock_func(aref):
        with open("staticpickle.pkl", "rb") as file:
            data = pickle.load(file)
        return data[aref]

    monkeypatch.setattr(extr_a, "getdata", mock_func)


def test_my_job1(
    mock_getdata,
):

    # attempt with the run method now from the script;
    # monkeypatch should be fine now since the import has been done,
    # we applied the monkeypatch afterward,
    # and we call the job run from within the same python process/env

    # this is monkeypatched
    extra_a.getdata("thisisaref")

    # this is not!!! the my_job1 is importing app.extract.extr_a on its own and using the extra_a.getdata... but this time the monkeypatch is not applied 
    my_job1.main()

Solution

  • It appears that extr_a attributes monkeypatched is a different attribute than my_job1.extr_a.

    So, if we monkeypatch as below - it will not patch extr_a attributes if being called inside another module:

    monkeypatch.setattr(extr_a, "getdata", mock_func)
    

    On the other hand, if we need to monkeypatch from within another module, we need to write the complete path to that module in the monkeypatch, like this:

    monkeypatch.setattr(my_job1.extr_a, "getdata", mock_func)
    

    And now calling my_job1.main() will have its inner extr_a usage being patched as well!