I am implementing a REST API endpoint that returns lists of objects of different types. The list will have roughly the following shape:
{
"type1": [
{ "prop1": "value1a", ... },
{ "prop1": "value1b", ... },
...
],
"type2": [
{ "prop2": "value2a", ... },
...
],
"type3": [
{ "prop3": "value3a", ... },
...
]
}
As the list of objects can get quite long, I would like to emit it as a stream, so that neither the server nor the client have to keep the whole list in memory and the client can already start processing the data before all of it has arrived. The JSON object would be streamed in chunks like this:
{
"type1": [
{ "prop1": "value1a", ... },
{ "prop1": "value2b", ... }
],
"type2": [
...
So far, so good.
To make it easier for people to use my REST API, I want to provide a TypeScript library that provides methods to call the various API endpoints. My problem is that JavaScript streams are made for flat structures, but I’m trying to find a way to stream a nested structure.
How can I stream a nested structure in JavaScript/TypeScript?
After experimenting with this for a while, here are two approaches that I have come up with along with their advantages and disadvantages.
Flatten the stream into a stream of entries (ReadableStream<["type1", Type1Object] | ["type2", Type2Object] | ["type3", Type3Object]>
).
Flattened entries streams are rather easy to create, as the data already arrives in the form of a flat stream (of bytes) from the REST API, so it just needs to be transformed into a stream of entries.
The main problem that I have experienced while trying to consume flattened streams is that their type doesn’t guarantee or even indicate in what order the items arrive. Let’s assume a ReadableStream<["type1", Type1Object] | ["type2", Type2Object] | ["type3", Type3Object]>
and I want to first process the "type1"
objects and then the "type2"
objects. While the REST API itself might guarantee that first all the "type1"
objects will arrive, then all the "type2"
objects and then the "type3"
objects, and I might mention this in the documentation of my code, the ReadableStream
type itself does not guarantee this order, so it would be bad style to rely on it (and the order of the REST API response might change). Instead, a consumer would have to iterate over the whole stream in order to be sure to have all the objects of one type. This means that consumers are either forced to implement their code in a way that manages to handle all object types in parallel, or they have to cache a lot of objects in memory before being able to process them.
Return a nested stream for each object type (ReadableStream<(ReadableStream<Type1Object> & { type: "type1" }) | (ReadableStream<Type2Object> & { type: "type2" }) | (ReadableStream<Type3Object>]> & { type: "type3" })
). The advantage of this approach is that it resembles the underlying object structure much more and can be iterated over in the same way as the plain object could be. Individual sub-streams can be teed or piped to different destinations if needed.
A common way to consume a nested stream would be through a nested iteration:
for await (const subStream of parentStream) {
for await (const chunk of subStream) {
// Do something with chunk
}
}
Producing a stream that can be consumed in this way is not too complicated, as the chunks are consumed in the order they arrive from the REST API. However, there are many other ways how such a stream can be consumed, and this is where the main challenge with this approach emerges: It is very difficult to implement it right. A nested stream needs to handle the following cases for example:
break
in the iteration or by calling subStream.cancel()
. The nested stream needs to handle this in a way that the parent stream continues emitting the rest of the sub streams and that the rest of the sub streams emit their data.If an implementation manages to get all of these right, nested streams can be a useful (although unusual) way to represent this data. As an inspiration, you can have a look at my implementation of StreamSplitter, which converts a flattened stream to a nested stream (with the assumption that the chunks arrive in order).
Consumers of a nested stream need to keep in mind that individual sub streams need to be discarded by calling subStream.cancel()
if not used, otherwise their data will remain in memory.
From the experience that I’ve gathered so far, I think that each of the approaches has advantages/disadvantages in certain scenarios. If implemented right, nested streams are more versatile, since they can be easily converted to flattened streams if needed, which is not possible the other way round. But they require a lot more thought and testing to implement. So my personal conclusion is that when time and resources allow, nested streams are the better option for the user, but otherwise, flattened streams also work and are much easier to implement.