Search code examples
apache-kafkahadoop-yarnapache-kafka-connect

How to run a Kafka connect worker in YARN?


I'm playing with Kafka-Connect. I've got the HDFS connector working both in stand-alone mode and distributed mode.

They advertise that the workers (which are responsible for running the connectors) can be managed via YARN However, I haven't seen any documentation that describes how to achieve this goal.

How do I go about getting YARN to execute workers? If there is no specific approach, are there generic how-to's as to how to get an application to run within YARN?

I've used YARN with SPARK using spark-submit however, I cannot figure out how to get the connector to run in YARN.


Solution

  • You can theoretically run anything on YARN, even a simple hello world program. Which is why saying Kafka-Connect runs on YARN is technically correct. The caveat is that getting Kafka-Connect to run on YARN will take a fair amount of elbow grease at the moment. There are two ways to do it:

    1. Directly talk to the YARN API to acquire a container, deploy the Kafka-Connect binaries and launch Kafka-Connect.
    2. Use the separate Slider project https://slider.incubator.apache.org/docs/getting_started.html that Stephane has already mentioned in the comments.

    Slider

    You'll have to read quite a bit of documentation to get it working but the idea behind Slider is that you can get any program to run on YARN without dealing with the YARN API and writing a YARN app master by doing the following:

    • Create a slider package out of your program
    • Define a configuration for you package
    • Use the slider cli to deploy your application onto YARN

    Slider handles container deployment and recovery of failed containers for you, which is nice. Also Slider is becoming a native part of YARN when YARN 3.0 is released.

    Alternatives

    Also as a side note, getting Kafka-Connect to deploy on Kubernetes or Mesos / Marathon is probably going to be easier. The basic workflow to do that would be:

    • Create a Kafka-Connect docker container or just use confluent's docker container
    • Create a deployment config for Kubernetes or Marathon
    • Click a button / run a command

    Tutorials

    • A good Mesos / Marathon tutorial can be found here
    • Kubernetes tutorial here
    • Confluent Kubernetes Helm Charts here