Search code examples
pythongoogle-cloud-platformairflowgoogle-cloud-composer

GCP Apache Airflow - How to install Python package from a private repository and import on DAG?


I have a private repository. This repository has my common functions about my DAG. (for example: datetime validaters, response encoder function) I want to import this repository's functions on my DAG file and I used this link to do it.

I created pip.conf file. this file's location is : my-bucket-name/config/pip/pip.conf and i added my private github repository in this file like this:

[global]
extra-index-url=https://<token>@github.com/my-private-github-repo.git

After this, i wanted to import this repository's functions on my dag file (for example: from common-repo import *) but i got 'module not found' error on my DAG. (and unfortunately in the cloud composer logs, I couldn't see any log showing that the private github repo has been installed.)

I've searched a lot but can't find how to do this.


Solution

  • You can add the private repo to the requirements in a PythonVirtualenvOperator like this:

    from airflow import DAG
    from airflow.decorators import task
    
    @task.virtualenv(
       task_id="virtualenv_python",
       requirements=["https://<token>@github.com/my-private-github-repo.git"],
                     system_site_packages=False
    )
    
    def callable_from_virtualenv():
       import your_private_module
    
       ..etc...
    
    
    virtualenv_task = callable_from_virtualenv()
    

    (Example ripped from Airflow python operator example)

    In order to avoid hardcoding token / credential in the source code, you can use an Airflow variable just like this:

    from airflow.models import Variable
    
    @task.virtualenv(
       task_id="virtualenv_python",
       requirements=[Variable.get("private_github_repo")],
                     system_site_packages=False
    )