Search code examples
apache-kafkasnapshotnfsdocker-volume

Kafka doesn't work with external NFS Volume


I am trying to run Kafka with mounted NFS Volume, facing exception and can not start Kafka:

    [2020-03-15 09:36:11,580] ERROR There was an error in one of the threads during logs loading: org.apache.kafka.common.KafkaException: Found directory /var/lib/kafka/data/.snapshot, '.snapshot' is not in the form of topic-partition or topic-partition.uniqueId-delete (if marked for deletion).
Kafka's log directories (and children) should only contain Kafka topic data. (kafka.log.LogManager)
[2020-03-15 09:36:11,582] ERROR [KafkaServer id=1] Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
org.apache.kafka.common.KafkaException: Found directory /var/lib/kafka/data/.snapshot, '.snapshot' is not in the form of topic-partition or topic-partition.uniqueId-delete (if marked for deletion).
Kafka's log directories (and children) should only contain Kafka topic data.
        at kafka.log.Log$.exception$1(Log.scala:2150)
        at kafka.log.Log$.parseTopicPartitionName(Log.scala:2157)
        at kafka.log.LogManager.kafka$log$LogManager$$loadLog(LogManager.scala:260)
        at kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$11$$anonfun$apply$15$$anonfun$apply$2.apply$mcV$sp(LogManager.scala:345)
        at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:63)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

This is my docker-compose scripts:

  zookeeper:
    image: confluentinc/cp-zookeeper:5.3.2
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181
    volumes:
      - zk-data:/var/lib/zookeeper/data:nocopy
      - zk-log:/var/lib/zookeeper/log:nocopy

  kafka:
    image: confluentinc/cp-kafka:5.3.2
    environment:
      KAFKA_ADVERTISED_HOST_NAME: kafka 
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181 
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT
      KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
    volumes:
      - kf-data:/var/lib/kafka/data:nocopy


volumes:
  zk-data:
    driver: local
    driver_opts:
      type: "nfs"
      o: addr=18.0.3.227 #IP of NFS
      device: ":/opt/data/zk-data"
  zk-log:
    driver: local
    driver_opts:
      type: "nfs"
      o: addr=18.0.3.227
      device: ":/opt/data/zk-log"
  kf-data:
    driver: local
    driver_opts:
      type: "nfs"
      o: addr=18.0.3.227
      device: ":/opt/data/kf-data"

If I go to my NFS server,

ls -la /opt/data/kf-data/.snapshot

total 80
drwxrwxrwx 33 root   root         12288 Mar 28 00:10 .
drwx------  2 root domain^users  4096 Feb 21 19:20 ..
drwx------  2 root domain^users  4096 Feb 13 11:06 daily.2020-02-14_0010
drwx------  2 root domain^users  4096 Feb 13 11:06 daily.2020-02-15_0010
drwx------  2 root domain^users  4096 Feb 13 11:06 daily.2020-02-16_0010
drwx------  2 root domain^users  4096 Feb 13 11:06 daily.2020-02-17_0010
drwx------  2 root domain^users  4096 Feb 21 19:20 snapmirror.ka938443-8ea1-22e8-6608-00a067d1a20a_2148891236.2020-02-27_180700

There is a hidden folder named .snapshot, this folder is generated by NFS automatically and can not be removed. This is the reason why Kafka complains: Found directory /var/lib/kafka/data/.snapshot, '.snapshot' is not in the form of topic-partition or topic-partition.uniqueId-delete (if marked for deletion).

And this could be the general Kafka problem, is there any special configure or solution to let Kafka use the external NFS volume?

Any ideas will be grateful!


Solution

  • If you are using NetApp as NFS platform, this info could help: disable .snapshot access in NetApp is a global vFilter function, which is not a function per folder or share.

    If you can not turn off the access to .snapshot, there is no solution, unless you use other NFS platforms, which will not generate .snapshot folder in every folder.