I'm playing with Kafka-Connect. I've got the HDFS connector
working both in stand-alone mode and distributed mode.
They advertise that the workers (which are responsible for running the connectors) can be managed via YARN
However, I haven't seen any documentation that describes how to achieve this goal.
How do I go about getting YARN
to execute workers? If there is no specific approach, are there generic how-to's as to how to get an application to run within YARN
?
I've used YARN
with SPARK using spark-submit
however, I cannot figure out how to get the connector to run in YARN
.
You can theoretically run anything on YARN, even a simple hello world program. Which is why saying Kafka-Connect runs on YARN is technically correct. The caveat is that getting Kafka-Connect to run on YARN will take a fair amount of elbow grease at the moment. There are two ways to do it:
You'll have to read quite a bit of documentation to get it working but the idea behind Slider is that you can get any program to run on YARN without dealing with the YARN API and writing a YARN app master by doing the following:
Slider handles container deployment and recovery of failed containers for you, which is nice. Also Slider is becoming a native part of YARN when YARN 3.0 is released.
Also as a side note, getting Kafka-Connect to deploy on Kubernetes or Mesos / Marathon is probably going to be easier. The basic workflow to do that would be: