google-cloud-platform stackdriver google-cloud-run

Stackdriver Trace with Google Cloud Run

I have been diving into a Stackdriver Trace integration on Google Cloud Run. I can get it to work with the agent, but I am bothered by a few questions.

Given that

The Stackdriver agent aggregates traces in a small buffer and sends them periodically.
CPU access is restricted when a Cloud Run service is not handling a request.
There is no shutdown hook for Cloud Run services; you can't clear the buffer before shutdown: the container just gets a SIGKILL. This is a signal you can't catch from your application.
Running a background process that sends information outside of the request-response cycle seems to violate the Knative Container Runtime contract
The collections of logging data is documented and does not require me to run an agent, but there is no such solution for telemetry.
I found one report of someone experiencing lost traces on Cloud Run using the agent-based approach

How Google does it

I went into the source code for the Cloud Endpoints ESP, (the Cloud Run integration is in beta) to see if they solve it in a different way, but there the same pattern is used: there is a buffer with traces (1s) and it is cleared periodically.

Question

While my tracing integration seems to work in my test setup, I am worried about incomplete and missing traces when I run this in a production environment.

Is this a hypothetical problem or a real issue?
It looks like the right way to approach this is to write telemetry to logs, instead of using an agent process. Is that supported with Stackdriver Trace?

Solution

Cloud Run now supports sending SIGTERM. If your application handles SIGTERM it'll get 10 seconds grace time before shutdown.

You can use the 10 seconds to:

Flush buffers that have unsent data
Close connections to other systems

Docs: Container runtime contract