I'm usually working as a mathematician to build and train new models on data, but I want to try and learn some new things - so I am an absolute beginner in the following problem. My current learning topic is how to use CI/CD in gitlab. I've realized a Python Project in PyCharm. My folder structure is looking as follows:
Where "..." symbolizes some not important files, .py-files are regular python files and expressions without a special ending are normal folders. The file DR.py reads some data, the file train.py trains a model, the file pred.py makes a prediction based on the trained model, the test folder contains test-files to test the previous components and the flask folder contains a flask app webservice as deployment for a customer. So far so good, everything works fine on my local machine. Now I want to integrate this project into a CI/CD pipeline in gitlab with the following structure:
run DR.py > run test_DR.py > run train.py > run test_train.py > run pred.py > run test_pred.py > run webservice_flask.py
I'm not sure, if this structure is realistic, since I've read that the components of one stage (like test) run parallel. I know I have to create a .gitlab-ci.yml file in my root project folder, to initialize the CI/CD pipeline. The structure of the file (as far as I understand) should look as follows:
stages:
- build
- test
- deploy
build_database:
stage: build
script:
- echo "Load the data"
- #...here I want to run the DR.py file, but I don't know how
test_data_reader:
stage: test
script:
- echo "Test the loaded data"
- #...here I want to run the test_DR.py file, but I don't know how
... and so far and so on with each component of my pipeline.
Can anyone help me please? Are my initial attempts total nonsense? Can someone tell me the command for running files in the Project for the yaml document? Thank you very much!
You don't even have to break it down into stages, you can run all stages in a single job, just list them all under script
like so:
build_database:
stage: build
script:
- python DR.py
- python test/run test_DR.py
- python train.py
- python test/test_train.py
- python pred.py
- python test/test_pred.py
Basically you run it in the pipeline the same way you would run it locally.
The only thing that doesn't make sense in your set up is the final step - running the webservice. You see, the CI runner won't be able to act as a server for your webservice - it should just build/deploy and quit. You will have to think about where and how you want to run this webservice and deploy the code there (possibly with some artifacts). If you don't know where to start with it, have a look at Heroku as an option.