Search code examples
apache-sparkapache-kafkaspark-structured-streaming

How to slow down the write speed of Kafka Producer?


I use the spark to write data to kafka in this way.

df.write(). format("kafka"). save()

can I control the writing speed to kafka to avoid pressure on kafka? Is there some options that helps to slow down the speed?


Solution

  • I think setting linger.ms to a non-zero value would help. As it controls the amount of time to wait for additional messages before sending the current batch. Code can look like the following

    df.write.format("kafka").option("linger.ms", "100").save()
    

    But this really depends on a lot of things. If your Kafka is 'big' enough and configured properly, I wouldn't worry too much about the speed. After all, kafka is designed to cope with this situation (traffic spike).