google-cloud-platform google-cloud-run knative-serving knative

Simple HelloWorld app on cloudrun (or knative) seems too slow

I deployed a sample HelloWorld app on Google Cloud Run, which is basically k-native, and every call to the API takes 1.4 seconds at best, in an end-to-end manner. Is it supposed to be so?

The sample app is at https://cloud.google.com/run/docs/quickstarts/build-and-deploy

I deployed the very same app on my localhost as a docker container and it takes about 22ms, end-to-end.

The same app on my GKE cluster takes about 150 ms, end-to-end.

import os

from flask import Flask

app = Flask(__name__)

@app.route('/')
def hello_world():
    target = os.environ.get('TARGET', 'World')
    return 'Hello {}!\n'.format(target)

if __name__ == "__main__":
    app.run(debug=True,host='0.0.0.0',port=int(os.environ.get('PORT', 8080)))

I am a little experience in FaaS and I expect API calls would get faster as I invoked them in a row. (as in cold start vs. warm start)

But no matter how many times I execute the command it doesn't go below 1.4 seconds.

I think the network distance isn't the dominant factor here. The round-trip time via ping to the API endpoint is only 50ms away, more or less

So my questions are as follows:

Is it potentially an unintended bug? Is it a technical difficulty which will be resolved eventually? Or maybe nothing's wrong, it's just the SLA of k-native?
If nothing's wrong with Google Cloud Run and/or k-native, what is the dominant time-consuming factor here for my API call? I'd love to learn the mechanism.

Additional Details:

Where I am located at: Seoul/Asia
The region for my Cloud Run app: us-central1
type of Internet connection I am testing under: Business, Wired
app's container image size: 343.3MB
the bucket location that Container Registry is using: gcr.io

WebPageTest from Seoul/Asia (warmup time):

Content Type: text/html
Request Start: 0.44 s
DNS Lookup: 249 ms
Initial Connection: 59 ms
SSL Negotiation: 106 ms
Time to First Byte: 961 ms
Content Download: 2 ms

WebPageTest from Chicago/US (warmup time):

Content Type: text/html
Request Start: 0.171 s
DNS Lookup: 41 ms
Initial Connection: 29 ms
SSL Negotiation: 57 ms
Time to First Byte: 61 ms
Content Download: 3 ms

ANSWER by Steren, the Cloud Run product manager

We have detected high latency when calling Cloud Run services from some particular regions in the world. Sadly, Seoul seems to be one of them.

Solution

[Update: This person has a networking problem in his area. I tested his endpoint from Seattle with no problems. Details in the comments below.]

I have worked with Cloud Run constantly for the past several months. I have deployed several production applications and dozens of test services. I am in Seattle, Cloud Run is in us-central1. I have never noticed a delay. Actually, I am impressed with how fast a container starts up.

For one of my services, I am seeing cold start time to first byte of 485ms. Next invocation 266ms, 360ms. My container is checking SSL certificates (2) on the Internet. The response time is very good.

For another service which is a PHP website, time to first byte on cold start is 312ms, then 94ms, 112ms.

What could be factors that are different for you?

How large is your container image? Check Container Registry for the size. My containers are under 100 MB. The larger the container the longer the cold start time.
Where is the bucket located that Container Registry is using? You want the bucket to be in us-central1 or at least US. This will change soon with when new Cloud Run regions are announced.
What type of Internet connection are you testing under? Home based or Business. Wireless or Ethernet connection? Where in the world are you testing from? Launch a temporary Compute Engine instance, repeat your tests to Cloud Run and compare. This will remove your ISP from the equation.

Increase the memory allocated to the container. Does this affect performance? Python/Flask does not require much memory, my containers are typically 128 MB and 256 MB. Container images are loaded into memory, so if you have a bloated container, you might now have enough memory left reducing performance.

What does Stackdriver logs show you? You can see container starts, requests, and container terminations.