Search code examples
apache-kafka-streamsmaterialized-views

Query Kafka materialized views efficiency


I would like to create a scalable distributed application that use materialized views instead of traditional database.

Could you tell me please how efficient are materialized gets views compared to SELECTs by id? I'm afraid that this "hops" between different instances using REST will slow down them a lot.

org.apache.kafka.streams.state.HostInfo hostInfo = interactiveQueryService.getHostInfo("store-name",
                        key, keySerializer);

if (interactiveQueryService.getCurrentHostInfo().equals(hostInfo)) {

    //query from the store that is locally available
}
else {
    //query from the remote host
}

How reliable is this? How to differ lack of element and unsuccessful "hop"?


Solution

  • The tricky part is to avoid unsuccessful "hops" that will increase latency. This is true especially if you have many instances (more state is spread out between individual instances) success_rate = 1/number_of_instances. There is basically two ways to avoid that:

    1. Smart load balancer can perform the routing logic before sending the initial request to the microservice. It applies the partitioner logic to obtain the partition ID, compares it against its internal table of consumer group assignments, and then forwards request accordingly.

    2. It is also possible to represent read requests as streams of events, and send oth the read events and write events through a stream processor; the processor responds to read events by emitting the result the read to an output stream.