Search code examples
hazelcasthazelcast-imap

What is the main difference between the new PNCOUNTER and IAtomicLong in Hazelcast?


I have issues understanding the new features of hazelcast 5.0 Which is the main difference between those data structures? because pncounter is a counter that replicates data and when there is no more updates they combine together, I want to understand how does PNCounter of hazelcast controls the concurrency on hazelcast.

The replicant count max value it's about the max nodes you have running on hazelcast?

How does work internally? because I need to understand how this works, I'm working with a counter that counts the activity of several clients, we create like 1000 or even more pncounters for different activities because I don't know if one pn counter would work.

Does the client know which counter need to connect or does the counter follow a certain logic flow? I don't understand this feature, I really want to know the difference between pncounter and atomiclong.

for me it's like atomiclong which the feature that it can replicate.


Solution

  • It all boils down to the CAP Theorem.

    In summary, out of Consistency, Availability & Partition-Resistance, you can pick 2 out of 3. And since Hazelcast is distributed by nature, your choice is between Consistency and Availability.

    IAtomicLong is a member of CP Subsystem API

    -- https://docs.hazelcast.com/imdg/4.2/data-structures/iatomiclong

    A Conflict-free Replicated Data Type (CRDT) is a distributed data structure that achieves high availability by relaxing consistency constraints. There may be several replicas for the same data and these replicas can be modified concurrently without coordination. This means that you may achieve high throughput and low latency when updating a CRDT data structure. On the other hand, all of the updates are replicated asynchronously.

    -- https://docs.hazelcast.com/imdg/4.2/data-structures/pn-counter

    In summary, IAtomicLong sacrifices Availability for Consistency. The result will always be correct, but it might not always be available.

    PNCounter makes the opposite trade-off. It's always available (depending on the number of nodes of the cluster, of course) but it's eventually consistent as it's asynchronously replicated.