Search code examples
google-cloud-platformgoogle-cloud-sqlgoogle-cloud-data-fusioncloud-sql-proxy

How to connect Data Fusion to Cloud SQL Proxy


I'm on a journey trying to connect Data Fusion with Cloud SQL MySQL with private IP. I've read many ressources and it seems that it is possible (at least I'm still not convinced that it is not possible). What I have so far:

  • a Data Fusion private instance with a private IP.
  • a Cloud SQL for MySQL instance with private IP.
  • a Cloud SQL Proxy deployed on a virtual machine.
  • everything is connected to the same default VPC network.
  • firewall fully open (Ingress, Egress on IP ranges: 0.0.0.0/0 and all protocal ports)

from my VM instance I can connect to the MySQL db using the following command mysql -u root –host 127.0.0.1 –port 3306. When trying to use the same parameters in Cloud Fusion I'm not able to establish the connection. What should, what can I check to make sure that all this is correctly setup.

EDIT

I've initially accepted the answer from Ajai but then unaccepted it as I'm not able to make the connection work in a new project. There is probably an element, something that's need to be done somewhere, that is missing here.


Solution

  • I've successfully recreated the environment and here are the detailed steps, perhaps you missed a step along the way:

    1. Create a subnet in a VPC with Private Google Access Configuring Private Google Access
    2. Create a Private Cloud Data Fusion instance attached to the same VPC
    3. Create a firewall rule allowing the allocated Service Networking range to access the proxy VM on port 3307
    4. Create a Private CloudSQL MySQL instance attached to the same VPC
    5. Created a VPC Peering between Cloud Data Fusion and the same VPC as per the steps outlined in Set up VPC Network Peering
    6. Deployed a VM in the subnet on step 1
    7. Deployed the CloudSQL Proxy via the steps outlined in Install the Cloud SQL Auth proxy
    8. Executed the Cloud SQL Proxy with the following command line (note, 0.0.0.0 allows binding to all IPs):
        *./cloud_sql_proxy -instances=<Instance Connection Name>=tcp:0.0.0.0:3307
    
    1. Ran the test on the CDF console: Successful Connection

    Once you've verified the above, you can then automate the CloudSQL Proxy as a linux service or startup script.

    P.S. thanks for quoting our article!

    Edit:

    If you want to use the docker version of the proxy, use the following in place of steps 7 & 8 as per Ajai's answer:

    sudo docker run -d \
      -p 0.0.0.0:3307:3307 \
      gcr.io/cloudsql-docker/gce-proxy:latest /cloud_sql_proxy \
      -instances=<instance connection name>=tcp:0.0.0.0:3307
    

    Edit 2

    The 2 key things to point out about the proxy is that you might already have 3306 bound to MySQL on the same instance. Using a port like 3307 (or other number) reduces that possibility. Note that for outbound connections to CloudSQL itself, the CloudSQL Proxy does use 3307 How the Cloud SQL Auth proxy works.

    The second thing is about setting it to listen on 0.0.0.0; as mentioned above, this binds to all IPs, allowing the proxy to listen to all incoming connections instead of those only coming from 127.0.0.1.