Search code examples
postruntime-errorgoogle-cloud-rungoogle-cloud-scheduler

Google Cloud Scheduler failing to invoke CloudRun container


I have deployed a containerized R application to Google's CloudRun using Docker. Since I want to run the code at a regular interval (not request based), I have set up a cloud scheduler which should invoke the container via an HTTP POST request. I am getting an error (503) response, and I cannot figure out why. Here is the detailed log message:

{
  httpRequest: {
    status: 503
  }
  insertId: "qx6q58f4iewwp"
  jsonPayload: {
    @type: "type.googleapis.com/google.cloud.scheduler.logging.AttemptFinished"
    jobName: "projects/PROJECT_ID/locations/europe-west1/jobs/scheduled-cloud-run-job"
    status: "UNAVAILABLE"
    targetType: "HTTP"
    url: "CONTAINER_URL"
  }
  logName: "projects/PROJECT_ID/logs/cloudscheduler.googleapis.com%2Fexecutions"
  receiveTimestamp: "2023-06-06T13:20:05.745924313Z"
  resource: {
    labels: {
      job_id: "scheduled-cloud-run-job"
      location: "europe-west1"
      project_id: "PROJECT_ID"
    }
    type: "cloud_scheduler_job"
  }
  severity: "ERROR"
  timestamp: "2023-06-06T13:20:05.745924313Z"
}

Here is the terraform configuration for the cloud scheduler:

# -- Create cloud scheduler job -- #
resource "google_cloud_scheduler_job" "default" {
  name             = "scheduled-cloud-run-job"
  description      = "Invokes the Cloud Run container with our pipeline on a recurrent basis."
  schedule         = "*/10 * * * *"
  time_zone        = "Europe/Stockholm"
  retry_config {
    retry_count = 1
  }
  http_target {
    http_method = "GET"
    uri         = "${google_cloud_run_service.default.status[0].url}"
    #body        = base64encode("{\run_container\": \"run\"}")
    #headers     = {"Content-Type" : "application/json", "User-Agent" : "Google-Cloud-Scheduler"}
    oidc_token {
      service_account_email = google_service_account.default.email
    }
  }
}

Here is the terraform configuration for my cloud run service:

resource "google_cloud_run_service" "default" {
    name     = "containerized-pipeline"
    location = var.region
    project  = var.project_id
    template {
      spec {
        containers {
          image = "${local.artifact_storage_address}:${local.tag}"
          ports {
            #name           = "h2c"
            container_port = 8080
          }
          resources {
            limits = {
              "cpu"    = "1000m"
              "memory" = "2000Mi"
            }
          }
        }
        container_concurrency = 1
      }
      metadata {
        annotations = {
          "run.googleapis.com/client-name"      = "terraform"
          "autoscaling.knative.dev/minScale"    = 1
          "autoscaling.knative.dev/maxScale"    = 30
          # "run.googleapis.com/cpu-throttling"   = false
        }
      }
    }
    traffic {
        percent         = 100
        latest_revision = true
    }
    depends_on = [
        null_resource.docker_build
    ]
 }

data "google_iam_policy" "noauth" {
   binding {
     role = "roles/run.invoker"
     members = ["allUsers"]
   }
 }

 resource "google_cloud_run_service_iam_policy" "noauth" {
   location    = google_cloud_run_service.default.location
   project     = google_cloud_run_service.default.project
   service     = google_cloud_run_service.default.name
   policy_data = data.google_iam_policy.noauth.policy_data
}

And here is the part of my source code which starts up the API server on my container and is handling the POST requests sent by the Cloud Scheduler:

runPipeline = function(run_container){
  body = as.character(run_container)
  if (body == "run") {  
    date = as.character(dbGetQuery(con, q)$`f0_`)
    if (date == "2017-01-01") {
      load2BQinitial(data = reformatData(data=cleanData(df=getData(date=date))))
    }
    if (date != "2017-01-01") {
      load2BQincremental(data = reformatData(data=cleanData(df=getData(date=date))))
    }
    return((paste0("Running the pipeline starting from ", date)))
  }
  else {
    return((paste0("Something went wrong, please make sure GET request is sent correctly.")))
  }
}


# -- Create API endpoint to receive & feed new data to model -- #  

newBeakr() %>% 
  httpGET(path = "/launch", decorate(runPipeline)) %>%        # Respond to GET requests at the "/launch" route
  handleErrors() %>%                                          # Handle any errors with a JSON response
  listen(host = "0.0.0.0", port = 8080)                       # Start the server on port 8080

I came across several google issues being tracked for 500 and 503 errors, none of them gave a conclusive solution. Anyone have any idea why I am getting the 503?

Sending a curl request results in the following response:

{
  httpRequest: {
    latency: "2.733624s"
    protocol: "H2C"
    remoteIp: "X.X.X.X"
    requestMethod: "GET"
    requestSize: "544"
    requestUrl: "<URL>a.run.app"
    responseSize: "1238"
    serverIp: "X.X.X.X"
    status: 503
    userAgent: "curl/7.74.0"
  }
  insertId: "XXXXXXXXXXXXXXXX"
  labels: {
    instanceId: "XXXXXXXXXXXXX"
  }
  logName: "projects/PROJECT_ID/logs/run.googleapis.com%2Frequests"
  receiveTimestamp: "2023-06-07T07:01:14.296557338Z"
  resource: {
    labels: {
      configuration_name: "containerized-pipeline"
      location: "europe-west1"
      project_id: "PROJECT_ID"
      revision_name: "containerized-pipeline-00001-pwn"
      service_name: "containerized-pipeline"
    }
    type: "cloud_run_revision"
  }
  severity: "ERROR"
  spanId: "12660848597156063613"
  textPayload: "The request failed because either the HTTP response was malformed or connection to the instance had an error. Additional troubleshooting documentation can be found at: https://cloud.google.com/run/docs/troubleshooting#malformed-response-or-connection-error"
  timestamp: "2023-06-07T07:01:11.486699Z"
  trace: "projects/PROJECT_ID/traces/34366480e3a4143d4008eed0452eacf0"
  traceSampled: true
}

UPDATE: I should also mention that I ran my R application locally and sent in an API request to my local host, and my callback function worked just fine. So the issue has something to do with how my Cloud Run instance and Cloud Scheduler are interacting.. I just don't know what this issue is.


Solution

  • So after some additional debugging I was able to figure out what the issue was and to resolve it. Though I'm not sure whether the error 503 which I was getting accurately reflects this, it was a mismatch in the location of my resources.

    For more context: My terraform state-file bucket is in europe-west4, and the original location of my all of my resources were also europe-west4. Somewhere along the line when working, I decided to create my cloud scheduler service and I did so in region europe-west1, so I also re-created the artifact registry, container and cloud-run service in europe-west1. When debugging, I eventually changed the location of the artifact registry and cloud-run service back to europe-west4 and kept the scheduler in europe-west1, at which point I stopped getting the 503 error. Everything works completely fine now.

    Not sure how useful or replicable my solution is, but hopefully it helps anyone who runs into the same issue.