Google CDN behavior when serving concurrent (chunked) requests

I'm trying to understand the Google CDN behavior in the following scenario:

Let's assume I have a backend service serving chunked http data. For the sake of the explanation, let's assume that serving a single request takes up to 10s
Let's imagine the case where a file is requested through the CDN by a client A, and that this file is not currently cached in the CDN. The request will go to the backend service, that starts serving the file. Client A will immediately start receiving HTTP chunks
After 5s, another client B requests the same file. I can envision 3 possible behaviors, but I can't figure out how to control this through the CDN configuration:

Option a: the CDN simply pass the request to the backend service, ignoring that half of the file has already been served and could already be cached. Not desirable as the backend service will be reached twice and serve the same data twice.

Option b: the CDN puts the second request on "hold", waiting for the first one to be terminated before serving the client B from its cache (in that case, the request B does not reach the backend service). Ok, but still not amazing as client B will wait 5s before getting any http data.

Option c: the CDN immediately serves the first half of the http chunks and then serves the remaining http chunks at the same pace than request A. Ideal!

Any ideas on the current behavior ? And what could we do to get the option C, which is by far our preferred option ?

Tnx, have a great day!

Jeannot

Solution

It is important to note that GFE historically cached only complete responses and stored each response as a single unit. As a result, the current behavior will follow option A. You can take a look at this help center article for more details.

However, with the introduction of Chunk caching, which is currently in Beta, large response bodies are treated as a sequence of chunks that can each be cached independently. Response bodies less than or equal to 1 MB in size can be cached as a unit, without using chunk caching. Response bodies larger than 1 MB are never cached as a unit. Such resources will either be cached using chunk caching or not cached at all.

Only resources for which byte range serving is supported are eligible for chunk caching. GFE caches only chunk data received in response to byte range requests it initiated, and GFE initiates byte range requests only after receiving a response that indicates the origin server supports byte range serving for that resource.

To be more clear, once Chunk caching is in GA, you would be able to achieve your preferred option C.