Search code examples
loggingapache-kafkapartitionkafka-topic

understanding kafka log.dirs


I have Kafka cluster and the log.dirs=/data/kafka is set to the data directory in server.properties. my DATA partition is kept getting full due to these logs which take a big part of it. (talking about binary logs in topic directory like 000000000000000.log) I read in THE DOCUMENTATION about this parameter (log.dirs The directories in which the log data is kept. If not set, the value in log.dir is used)

and I do not fully understand the meaning yet Moreover, can they be deleted, and which retention should be configured? and is it recommended to separate it from the data directory? thanks


Solution

  • Kafka Topic is a logical grouping of one or more Kafka partitions. Each kafka partition is essentially (log) file/s on the disk. So the data you published kafka are stored in these files (logs) only.

    log.dirs tells kafka where to create these files. So whenever you have a new partition (by increasing partition on existing topic or by creating a new topic altogether), you would see new file/s in log.dirs.

    You should not delete the data from this folder manually. Use log.retention.hours to configure how long should Kafka hold your data.