Search code examples
apache-kafkacloudera

Kakfa fails to start with Fatal error "Duplicate log directories found"


Kafka fails to start with below error:

Fatal error during KafkaServer startup. Prepare to shutdown
java.lang.IllegalArgumentException: Duplicate log directories found: /node5/kafka/data/logs-47, /node7/kafka/data/logs-47!
    at kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$3$$anonfun$apply$10$$anonfun$apply$1.apply$mcV$sp(LogManager.scala:155)
    at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:56)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

Kafka 0.9.0.1 is deployed as part of Cloudera.

What does the issue mean?

Is there a workaround or solution to this problem? Couldn't find it.

I reached this error after restart the broker following underreplicated partitions issue in some topic partition.

There were below errors in the broker's log before the restart

    java.lang.IllegalStateException: Compaction for partition [logs,47] cannot be aborted and paused since it is in LogCleaningPaused state.
        at kafka.log.LogCleanerManager$$anonfun$abortAndPauseCleaning$1.apply$mcV$sp(LogCleanerManager.scala:149)
        at kafka.log.LogCleanerManager$$anonfun$abortAndPauseCleaning$1.apply(LogCleanerManager.scala:140)
        at kafka.log.LogCleanerManager$$anonfun$abortAndPauseCleaning$1.apply(LogCleanerManager.scala:140)
        at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:231)
...

Cloudera Data Directories configuration:

/node/kafka/data
/node2/kafka/data
...
/node8/kafka/data

UPDATE

I've inspected duplicate directories contents and found that the newest directory seems to have empty log segments:

ls -l /node5/kafka/data/logs-47

-rw-r--r-- 1 kafka kafka 10485760 Mar  9 05:35 00000000000000000000.index
-rw-r--r-- 1 kafka kafka        0 Mar  8 13:12 00000000000000000000.log

While older folder is not:

ls /node7/kafka/data/logs-47

-rw-r--r-- 1 kafka kafka 10485760 Mar  9 05:35 00000000000000366115.index
-rw-r--r-- 1 kafka kafka        0 Nov 25 10:13 00000000000000366115.log

Solution

  • Your error is saying that you have more than one directory on a single broker that contains data from a topic named logs with at least 47 partitions.

    The LogCleaner cannot continue to delete that data until it knows which one is the correct directory

    If you have some idea about what should be in the topic, you can dump the log segments and inspect messages

    If you don't know what data should be there, and the partitions are replicated to other brokers that are healthy, then delete all of the faulty partition directories and restart the broker, letting replication heal the missing data