Search code examples
google-cloud-platformgoogle-cloud-functionsvpcgoogle-cloud-data-fusiongoogle-vpc

Why is Serverless VPC Connector unnecessary when accessing Private Data Fusion via API from Cloud Functions?


I'm trying to run a private data fusion pipeline from cloud functions. My assumption was that I would need to create the following Serverless VPC Connectors: https://cloud.google.com/vpc/docs/configure-serverless-vpc-access?_ga=2.30674431.-1361434534.1676966158#before_you_begin

However, when I made a request to the following API without creating a serverless VPC connector, The pipeline ran successfully.

POST -H "Authorization: Bearer ${AUTH_TOKEN}" "${CDAP_ENDPOINT}/v3/namespaces/namespace-id/apps/pipeline-name/workflows/DataPipelineWorkflow/start" 

Reference site: https://cloud.google.com/data-fusion/docs/reference/cdap-reference#start_a_batch_pipeline

Why is Serverless VPC Connector unnecessary when accessing Private Data Fusion via API from Cloud Functions?


Solution

  • If you lookup the IP address of the CDAP API host, you will see that it resolves to a public IP address:

    nslookup <instance-id>-<project-id>-dot-<region-code>.datafusion.googleusercontent.com 8.8.8.8
    Server:     8.8.8.8
    Address:    8.8.8.8#53
    
    Non-authoritative answer:
    <instance-id>-<project-id>-dot-<region-code>.datafusion.googleusercontent.com
    canonical name = googlehosted.l.googleusercontent.com.
    Name:   googlehosted.l.googleusercontent.com
    Address: 64.233.170.132
    Name:   googlehosted.l.googleusercontent.com
    Address: 2404:6800:4003:c1a::84
    

    All the requests sent to the CDAP API go to a publicly accessible API endpoint by default. You can setup private Google Access so that requests to *.datafusion.googleusercontent.com are routed to a private IP address when the source VM is inside a GCP VPC.

    VPC peering is required for VMs in the Cloud Data Fusion (CDF) tenant project to reach out to IP addresses in the customer project VPC for running previews and validating source/sink connections during pipeline deployments.