Background
I'm following this tutorial here to make some first steps towards building a distributed environment: http://docs.spring.io/spring-cloud-dataflow-server-kubernetes/docs/current-SNAPSHOT/reference/htmlsingle/#_getting_started
What I'm trying to achieve is a distributed job queue to run legacy shell and c++ applications. The jobs should be distributed to several servers, based on their load.
What I do not intend is to split data of individual jobs, neither is it feasible (and performance-wise bad in any case, for what this is about) to process them in parallel.
So, if you want, I'm intending to misuse big data machinery for this kind of task.
Question
Given above background, under which circumstances would a Kafka message bus begin to congest?
Let's say, what happens when I have 4 servers to process the job queue and put many files, each tens or hundreds of MB, to the queue. Will Kafka for example deliver those messages to certain nodes, or will all nodes receive the same message? In the latter case I guess my cluster can only scale to the degree Kafka can handle this. What about other reasons for congestion in this scenario?
It could well be that Kafka isn't the right choice for what I'm trying to do. But that's also the reason I ask.
Kafka is not a file server. The default maximum message size (max.message.bytes) is 1000012 bytes. Do not use Kafka as a file server, it will not make you happy.
A possible pattern: use a long term storage solution (SAN, S3, etc) and use Kafka to communicate URIs to that storage