In beam sdk, pubusbIO read provides an option to deduplicate messages by using message id: https://beam.apache.org/releases/javadoc/2.23.0/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.Read.html#withIdAttribute-java.lang.String-
When I checkout Pubsub client libs (for java and python), I don't see there is a similar option for using message id to deduplicate messages.
So my questions are:
Thank you.
There isn't the same feature in the PubSub client library. Cloud Dataflow, that run Beam pipeline, keep a cache of the latest messageIds (I don't know how many and how many time, but it's only few minutes). It's a Beam feature.
When you use PubSub, and because PubSub guaranty only at-least-one-delivery, it's recommended to have your process idempotent
In general, accommodating more-than-once delivery requires your subscriber to be idempotent when processing messages.