erlang:make_ref vs erlang:unique_integer

Is there any rule on thumb of where and when to prefer erlang:make_ref/0 over erlang:unique_integer/0,1 or the other way around?

Most of the examples on make_refs seems to be around messages or requests. In these cases the reference is short lived and make_refs could be swapped out for unique_integer without much troubles.

Is there a difference in term of speed or guarantees? When trying to identify long lived objects or data that will be persisted and/or exchanged with other systems, should one be prefered over the other?

Solution

unique_integer([monotonic]) needs to be synchronized across the schedulers, so these are slower than non-monotonic unique_integer or make_ref.

Both non-monotonic unique_integer and make_ref use a similar way to construct themselves, using a pool exclusive for each scheduler. However, the unique_integer pool is always started at the same value whereas the reference pool uses the current time as a seed.

Moreover, the guarantees for unique_integer are only related to the current node. You can run erl -eval 'io:format("~p", [erlang:unique_integer()]),init:stop().' several times in a row and you'll see repeated values. Thus, if you have a cluster, the integers may not be unique.

On the other hand, as references include the node that created them, they are unique across the connected nodes in a cluster.

When using persisted data, you need to account for possible ERTS restart, and thus you cannot strongly guarantee uniqueness in time. Although using references may be enough (as I've shown that unique_integer is not strong enough when accounting for node restart), maybe you should use a different strategy such as having the persistence layer create the ids for you when you store the data (AUTOINC in MySQL).

But you can mix and match these as you wish. You can have each node pull an unique integer (let's call it node-id) from the persistence layer on boot and combine it with the unique_integer in a larger integer. Or you can have each node generate its own node-id with a combination of start time, strong randomness and a hash of its own name... It all depends on the requirements of your use case.