Search code examples
distributed-computingcloudflareserverlesscloudflare-workers

What are the use cases to store ReadableStream in the distributed data store like Cloudflare Workers KV?


Cloudflare's own globally distributed data store – Workers KV – can accept data of three "types": string, ArrayBuffer and ReadableStream.

While the use cases for the former two are clear enough, I am struggling to figure out how stored ReadableStream could be useful. I am familiar with the concept: using it you can "stream" different values over time, but what is the deal to put this in the data store? What are typical scenarios?


Solution

  • The difference between passing a string, ArrayBuffer, or ReadableStream is not what data is stored, but rather how the data gets there. Note that you can store data as a string and then later read it as an ArrayBuffer or vice versa (strings are converted to/from bytes using UTF-8). When you pass a ReadableStream to put(), the system reads data from the stream and stores that data; it does not store the stream itself. Similarly, when using get(), you can specify "stream" as the second parameter to get a ReadableStream back; when you read from this stream, it will produce the value's content.

    The main case where you would want to use streams is when you want to directly store the body of an HTTP request into a KV value, or when you want to directly return a KV value as the body of an HTTP response. Using streams in these cases avoids the need to hold the entire value in memory all at once; instead, bytes can stream through as they arrive.

    For example, instead of doing:

    // BAD
    let value = await request.text();
    await kv.put(key, value);
    

    You should do this:

    // GOOD
    await kv.put(key, request.body);
    

    This is especially important when the value is many megabytes in size. The former version would read the entire value into memory to construct one large string (including decoding UTF-8 to UTF-16), only to immediately write that value back out into a KV (converting UTF-16 back to UTF-8). The latter version copies bytes straight from the incoming connection into KV without ever storing the whole value in memory at once.

    Similarly, for a response, instead of doing:

    // BAD
    let value = await kv.get(key);
    return new Response(value);
    

    You can do:

    // GOOD
    let value = await kv.get(key, "readableStream");
    return new Response(value);
    

    This way, the response bytes get streamed from KV to the HTTP connection. This not only saves memory and CPU time, but also means that the client starts receiving bytes faster, because your Worker doesn't wait until all bytes are received before it starts forwarding them.