Search code examples
pythonamazon-ec2airflowairflow-webserver

import python libraries (eg: rapidjson) in airflow


I want to use the Python library rapidjson in my Airflow DAG. My code repo is hosted on Git. Whenever I merge something into the master or test branch, the changes are automatically configured to reflect on the Airflow UI.

My Airflow is hosted as a VM on AWS EC2. Under the EC2 instances, I see three different instances for: scheduler, webserver, workers.

I connected to these 3 individually via Session Manager. Once the terminal opened, I installed the library using

pip install python-rapidjson

I also verified the installation using pip list. Now, I import the library in my dag's code simply like this:

import rapidjson

However, when I open the Airflow UI, my DAG has an error that:

No module named 'rapidjson'

Are there additional steps that I am missing out on? Do I need to import it into my Airflow code base in any other way as well?

Within my Airflow git repository, I also have a "requirements.txt" file. I tried to include

python-rapidjson==1.5.5

this there as well but I do not know how to actually install this.

I tried this:

pip install requirements.txt

within the session manager's terminal as well. However, the terminal is not able to locate this file. In fact, when I do "ls", I don't see anything.

pwd
/var/snap/amazon-ssm-agent/6522

Solution

  • Have you tried using the PythonVirtualEnvOperator ?

    It will allow you to install the library at runtime so you don't need to make changes on the server just for one job.

    To run a function called my_callable, simply use the following:

    from airflow.operators.python import PythonVirtualenvOperator
    
    
    my_task = PythonVirtualenvOperator(
            task_id="my_task ",
            requirements="python-rapidjson==1.5.5",
            python_callable=my_callable,
        )
    

    I still recommend updating your server environment for core libs, but this is a best practice when using special libs for a small minority of jobs.