Search code examples
apache-pulsar

non-persistent message is lost when throughput is high


I found that non-persistent messages are lost sometimes even though the my pulsar client is up and running. Those non-persistent messages are lost when the throughput is high (more than 1000 messages within a very short period of time. I personally think that this is not high). If I increase the parameter receiverQueueSize or change the message type to persistent message, the problem is gone.

I check the Pulsar source code (I am not sure this is the latest one)

https://github.com/apache/pulsar/blob/35f0e13fc3385b54e88ddd8e62e44146cf3b060d/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/nonpersistent/NonPersistentDispatcherMultipleConsumers.java#L185

and I think that Pulsar simply ignore those non-persistent messages if no consumer is available to handle the newly arrived non-persistent messages. "No consumer" here means

  • no consumer subscribe the topic
  • OR all consumers are busy on processing messages received before

Is my understanding correct?


Solution

  • The Pulsar broker does not do any buffering of messages for the non-persistent topics, so if consumers are not connected or are connected but not keeping up with the producers, the messages are simply discarded.

    This is done because any in-memory buffering would be anyway very limited and not sufficient to change any of the semantics.

    Non-persistent topics are really designed for use cases where data loss is an acceptable situation (eg: sensors data which gets updates every 1sec and you just care about last value). For all the other cases, a persistent topic is the way to go.