Search code examples
google-cloud-platformgoogle-bigquerygoogle-workflows

GCP workflow: load external sql file?


I am planning to have a Cloud Scheduler that calls a GCP Workflows every day at 8 a.m. My GCP Workflows will have around 15 different steps and will be only transformations (update, delete, add) on BigQuery. Some queries will be quite long and I am wondering if there is a way to load a .sql file into a GCP Workflows task1.yaml?

#workflow entrypoint
ProcessItem:
  params: [project, gcsPath]
  steps:
    - initialize:
        assign:
          - dataset: wf_samples
          - input: ${gcsPath}
          - sqlQuery: QUERY HERE
   ...

Solution

  • You need to do something similar: (of course you can assign this to a variable like input)

    #workflow entrypoint
    main:
      steps:
        - getSqlfile:
            call: http.get
            args:
              url: https://raw.githubusercontent.com/jisaw/sqlzoo-solutions/master/select-in-select.sql
              headers:
                Content-Type: "text/plain"
            result: queryFromFile
        - final:
            return: ${queryFromFile.body}
    

    For Cloud Storage that may look like:

     call: http.get
        args:
          url: https://storage.cloud.google.com/................./q1.sql
          headers:
            Content-Type: "text/plain"
          auth:
            type: OIDC
        result: queryFromFile
    

    Or event with this format (different URL syntax + OAuth2)

    call: http.get
        args:
          url: https://storage.googleapis.com/................./q1.sql
          headers:
            Content-Type: "text/plain"
          auth:
            type: OAuth2
        result: queryFromFile
    

    Make sure that invoker has the right permission to access the Cloud Storage file.

    Note: On further testing, this to work correctly the text/plain mime-type must be set on the GCS file.