How to persist kafka topics beyond host restart in swarm configuration

I'm using wurstmeister/kafka-docker, and following the swarm configuration in the kafka-docker wiki. Following the general docker instructions, I've added a volume. I discovered that the kafka log dirs is defined in part by the $HOSTNAME (which is the container id, in this network, I believe) if you don't explicitly set it in start-kafka.sh: export KAFKA_LOG_DIRS="/kafka/kafka-logs-$HOSTNAME". Since $HOSTNAME changes between restarts, it would not find the previous logs (this should probably use HOSTNAME_COMMAND?) this would change Since there's only one kafka running per host, I set it to a static value. So my resultant docker-compose-swarm.yml looks like:

version: '3.2'
services:
  zookeeper:
    image: wurstmeister/zookeeper
    ports:
      - "2181:2181"
  kafka:
    image: wurstmeister/kafka:latest
    ports:
      - target: 9094
        published: 9094
        protocol: tcp
        mode: host
    environment:
      HOSTNAME_COMMAND: "docker info | grep ^Name: | cut -d' ' -f 2"
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: INSIDE:PLAINTEXT,OUTSIDE:PLAINTEXT
      KAFKA_ADVERTISED_LISTENERS: INSIDE://:9092,OUTSIDE://_{HOSTNAME_COMMAND}:9094
      KAFKA_LISTENERS: INSIDE://:9092,OUTSIDE://:9094
      KAFKA_INTER_BROKER_LISTENER_NAME: INSIDE
      #  $HOSTNAME (container ID?) is used by default, that changes, so this, for now:
      KAFKA_LOG_DIRS: "/kafka/kafka-logs/aaa"

    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - kafkamount:/kafka
volumes:
  kafkamount:

Basicly, I added the KAFKA_LOG_DIRS, added the kafkamount: named volume and reference it in the kafka service.

I deploy the stack to a swarm with three nodes running on docker-machine: dar0, dar1, dar2. I also have a fourth vm, default, that I'm using to test. I test connectivity with:

docker run -i --network host confluentinc/cp-kafkacat kafkacat -b dar0:9094,dar1:9094,dar2:9094  -t test  -P

in one shell, and:

docker run --tty --network host confluentinc/cp-kafkacat kafkacat -b dar0:9094,dar1:9094,dar2:9094 -C  -t test

This all works, and I can see that data is going into /var/lib/docker/volumes/darstack_kafkamount/_data/kafka-logs/aaa.

However, if I shutdown the vms and then restart:

$ docker-machine stop dar0 dar1 dar2
...
$ docker-machine start dar0 dar1 dar2

I ususally get this error:

$ docker run --tty --network host confluentinc/cp-kafkacat kafkacat -b dar0:9094,dar1:9094,dar2:9094 -C  -t test
% ERROR: Topic test error: Broker: Leader not available

and no data from the topic. If I run it again, it sometimes works, and I get the data in the topic. But sometimes nothing.

Is this perhaps because the broker ids are assigned differently, depending on which instance started first? Or do I also need to add a volume for zookeeper? (I haven't seen anyone mention that.) Something else?

EDIT: To eliminate the possibility that its something about the broker ids, I added a BROKER_ID_COMMAND:

BROKER_ID_COMMAND: "docker info -f '{{`{{.Swarm.NodeAddr}}`}}' | sed 's/.*\\.\\([0-9]\\+\\)/\\1/'"

This uses the last part of the IP as the broker id (which is a bit brittle, but gets the job done). Seems to work, but doesn't fix the problem that a client doesn't see the data after a restart.

Solution

After some experimenting, I've discovered that adding volumes for zookeeper, in conjunction with the BROKER_ID_COMMAND seems to do the trick.

If I removed either, it wasn't working. I also added a depends_on for kafka to zookeeper, but I'm not sure that's essential.

services:
  zookeeper:
...
    volumes:
      - zookeeperconf:/opt/zookeeper-3.4.13/conf
      - zookeeperdata:/opt/zookeeper-3.4.13/data
...
  kafka:
    ...
    environment:
      ...
      BROKER_ID_COMMAND: '{{`docker info -f ''{{.Swarm.NodeAddr}}'' | sed ''s/.*\.\([0-9]\+\)/\1/''`}}'
    ...
    depends_on:
      - zookeeper
volumes:
   ...
   zookeeperconf:
   zookeeperdata:

This is in addition to the configuration I showed in the original post.