Search code examples
dockerapache-kafkaavroapache-nifi

Avro in base image for all services in a Docker Swarm?? NiFi --> Kafka --> PostgreSQL


I'm new to Docker. Also new to NiFi and Kafka...

Question: How do I manage Avro with a Docker stack? Do I install Avro for every image I build in my docker-compose.yml file? Should I have a separate container somehow?


Details:

My vision is a 5 machine swarm running ETL processes through NiFi to Kafka (using Avro), then pulling off the messages to 3 x PostgreSQL containers w/ unique databases. Another container will run a webserver to serve access to the DBs.

So that's three containers plus three PostgreSQL container instances.

Other services/microservices can live in the existing containers or spin off into a separate container eventually (ex: API).


Solution

  • Apache Nifi includes all necessary libraries to read and write Avro data.

    You might also want to consider using a container of the Confluent Schema Registry for centrally managing Avro data.

    NiFi has integration with the registry. Kafka doesn't care you're sending Avro to it, only the clients care how data is encoded or decoded

    If you only care about Kafka, Avro, and Postgres, try using Kafka Connect's JDBC connector, then managing NiFi isn't necessary

    If you do want Nifi and Kafka in respective cluster setups, they both depend on a Zookeeper instance