Search code examples
kubernetesgoogle-cloud-platformgoogle-cloud-composercloud-sql-proxy

How to Spin up a cloud proxy in cloud composer cluster


How to spin up cloud proxy for cloud composer cluster

Currently we use airflow to manage jobs and dynamic DAG creation. For this, one separate Dag is written to check database table in PostgreSQL for existing rules & if rule is active/inactive in PostgreSQL, we manually have set up to off/on dynamic DAGs in Airflow.Now, we are going to use Google's self managed Cloud Composer but problem is that we don't have access of db of cloud composer. How can we use cloud sql proxy to resolve this problem?


Solution

  • The Cloud Composer database is actually already accessible, because there is a Cloud SQL Proxy running within the environment's attached GKE cluster. You can use its service name airflow-sqlproxy-service to connect to it from within the cluster, using root. For example, on Composer 1.6.0, and if you have Kubernetes cluster credentials, you can list running pods:

    $ kubectl get po --all-namespaces
    composer-1-6-0-airflow-1-9-0-6f89fdb7   airflow-database-init-job-kprd5                                  0/1     Completed   0          1d
    composer-1-6-0-airflow-1-9-0-6f89fdb7   airflow-scheduler-78d889459b-254fm                               2/2     Running     18         1d
    composer-1-6-0-airflow-1-9-0-6f89fdb7   airflow-worker-569bc59df5-x6jhl                                  2/2     Running     5          1d
    composer-1-6-0-airflow-1-9-0-6f89fdb7   airflow-worker-569bc59df5-xxqk7                                  2/2     Running     5          1d
    composer-1-6-0-airflow-1-9-0-6f89fdb7   airflow-worker-569bc59df5-z5lnj                                  2/2     Running     5          1d
    default                                 airflow-redis-0                                                  1/1     Running     0          1d
    default                                 airflow-sqlproxy-668fdf6c4-vxbbt                                 1/1     Running     0          1d
    default                                 composer-agent-6f89fdb7-0a7a-41b6-8d98-2dbe9f20d7ed-j9d4p        0/1     Completed   0          1d
    default                                 composer-fluentd-daemon-g9mgg                                    1/1     Running     326        1d
    default                                 composer-fluentd-daemon-qgln5                                    1/1     Running     325        1d
    default                                 composer-fluentd-daemon-wq5z5                                    1/1     Running     326        1d
    

    You can see that one of the worker pods is named airflow-worker-569bc59df5-x6jhl, and is running in the namespace composer-1-6-0-airflow-1-9-0-6f89fdb7. If I SSH to one of them and run the MySQL CLI, I have access to the database:

    $ kubectl exec \
        -it airflow-worker-569bc59df5-x6jhl \
        --namespace=composer-1-6-0-airflow-1-9-0-6f89fdb7 -- \
          mysql \
            -u root \
            -h airflow-sqlproxy-service.default
    
    Welcome to the MySQL monitor.  Commands end with ; or \g.
    Your MySQL connection id is 27147
    Server version: 5.7.14-google-log (Google)
    
    Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.
    
    Oracle is a registered trademark of Oracle Corporation and/or its
    affiliates. Other names may be trademarks of their respective
    owners.
    
    Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
    
    mysql>
    

    TL;DR for anything running in your DAGs, connect using root@airflow-sqlproxy-service.default with no password. This will connect to the Airflow metadata database through the Cloud SQL Proxy that's already running in your Composer environment.


    If you need to connect to a database that isn't the Airflow database running in Cloud SQL, then you can spin up another proxy by deploying a new proxy pod into GKE (like you would deploy anything else into a Kubernetes cluster).