Search code examples
apache-kafkacloudapache-nificloudera-cdp

Read/Write with Nifi to Kafka in Cloudera Data Platform CDP public cloud


Nifi and Kafka are now both available in Cloudera Data Platform, CDP public cloud. Nifi is great at talking to everything and Kafka is a mainstream message bus, I just wondered:

What are the minimal steps needed to Produce/Consume data to Kafka from Apache Nifi within CDP Public Cloud

I would Ideally look for steps that work in any cloud, for instance Amazon AWS and Microsoft Azure.

I am satisfied with answers that follow best practices and work with the default configuration of the platform, but if there are common alternatives these are welcome as well.


Solution

  • There will be multiple form factors available in the future, for now I will assume you have an environment that contains 1 datahub with NiFi, and 1 Data Hub with Kafka. (The answer still works if both are on the same datahub).

    Prerequisites

    • Data Hub(s) with NiFi and Kafka
    • Permission to access these (e.g. add processor, create Kafka topic)
    • Know your Workload User Name (Cdp management console>Click your name (bottom left) > Click profile)
    • You should have set your Workload Password in the same location

    These steps allow you to Produce data from NiFi to Kafka in CDP Public Cloud

    Unless mentioned otherwise, I have kept everything to its default settings.

    In Kafka Data Hub Cluster:

    1. Gather the FQDN links of the brokers, and the used ports.
    • If you have Streams Messaging Manager: Go to the brokers tab to see the FQDN and port already together
    • If you cannot use Streams Messaging Manager: Go to the hardware tab of your Data Hub with Kafka and get the FQDN of the relevant nodes. (Currently these are called broker). Then add :portnumber behind each one. The default port is 9093.
    1. Combine the links together in this format: FQDN:port,FQDN:port,FQDN:port it should now look something like this:

    broker1.abc:9093,broker2.abc:9093,broker3.abc:9093

    In NiFi GUI:

    1. Make sure you have some data in NiFi to produce, for example by using the GenerateFlowFile processor
    2. Select the relevant processor for writing to kafka, for example PublishKafka_2_0, configure it as follows:
    • Settings
      • Automatically terminate relationships: Tick both success and faillure
    • Properties
      • Kafka Brokers: The combined list we created earlier
      • Security Protocol: SASL_SSL
      • SASL Mechanism: PLAIN
      • SSL Context Service: Default NiFi SSL Context Service
      • Username: your Workload User Name (see prerequisites above)
      • Password: your Workload Password
      • Topic Name: dennis
      • Use Transactions: false
      • Max Metadata Wait Time: 30 sec
    1. Connect your GenerateFlowFile processor to your PublishKafka_2_0 processor and start the flow

    These are the minimal steps, a more extensive explanation can be found on in the Cloudera Documentation. Note that it best practice to create topics explicitly (this example leverages the feature of Kafka that automatically lets it create topics when produced to).

    These steps allow you to Consume data with NiFi from Kafka in CDP Public Cloud

    A good check to see if data was written to Kafka, is consuming it again.

    In NiFi GUI:

    1. Create a Kafka consumption processor, for instance ConsumeKafka_2_0, configure its Properties as follows:
    • Kafka Brokers, Security Protocol, SASL Mechanism, SSL Context Service, Username, Password, Topic Name: All the same as in our producer example above
    • Consumer Group: 1
    • Offset Reset: earliest
    1. Create another processor, or a funnel to send the messages to, and start the consumption processor.

    And that is it, within 30 seconds you should see that the data that you published to Kafka is now flowing into NiFi again.


    Full Disclosure: I am an employee of Cloudera, the driving force behind Nifi.