spring-webflux hazelcast hazelcast-imap hazelcast-jet

Expose entire IMap over REST/WebFlux without excessive Heap utilisation

I have a distributed Hazelcast map (IMap) which is potentially very large.

I am tasked with returning the ENTIRE map values() collection as response to an HTTP GET request.

To minimise heap utilisation I plan to use Spring WebFlux and return an instance of Flux.

My concern is that invocation of IMap#values().iterator().next(), which is implicit in Flux.fromIterable(), might deserialize ALL values from ALL cluster members, thus blowing the heap of the JVM which is a Hazelcast client servicing the GET request.

If this concern is well-founded, then:

Would Hazelcast Jet provide a solution? I could create a Pipeline.withSource(IMap), but how would I create the sink as an instance of Flux which can be returned?

Many thanks, Robin.

Solution

This concern is valid. There actually is a query size limit (see here), for a large map the values() call will fail.

Jet isn't useful for request-response scenario: it can handle large maps in a streaming way, but it delivers the map entries to a sink and not to the caller. You could maybe hack it around, but it's not straightforward.

In the upcoming Hazelcast 4.1 there will be SQL API which would be best for your use case: if you query the map using SQL, even large results can be streamed to the client without constant memory usage.

As a workaround, you can look into the backing code of the Jet map reader: ReadMapOrCacheP.java, it uses an internal API to read the map incrementally. But it's an internal and unsupported API that can be changed/removed with each release.