I have a function that will load lots of users (which takes a while) and store them in an atom. I am wondering if there is any difference between loading the users into a let binding and then resetting the atom or just loading them in the atom reset! function?
(let [all-users (get-users)]
(reset! users all-users))
or
(reset! users (get-users))
Since reset!
is a function, the call to (reset! users (get-users))
will behave as any other function call in Clojure: each of the S-expressions in the call will be evaluated and then passed as arguments to the function. This means that evaluation of (get-users)
will happen first, and the result passed to reset!
. As such, this will behave exactly as the let
form does.
swap!
Where these concerns come into play is with swap!
. Because you send swap!
a function to be called inside a transaction, you have more control over whether your long running job happens inside or outside the transaction. For example, if you had functions poll-users-updates
and update-users-from-poll
, you could set the call to the first function to happen either inside or outside the transaction:
; outside the transaction
(swap! users update-users-from-poll (poll-users-updates))
; inside the transaction
(swap! users (fn [users] (update-users-from-poll users (poll-users-updates))))
The second form here is more likely to have to be restarted, since it will take longer for the update function to be run, leaving more time for some other write to the atom to force a restart.
In contrast, the first form would be less likely to force retries, and thus generally preferred. On the other hand, if your poll-users-updates
function also needed to operate on the current state of the users
data (for instance, to find the timestamp of the most recently updated user, in order to do it's poll more efficiently), then the second approach might be preferred, as it would ensure that you had the most recent value of users
in making the poll.
What this highlights with respect to STM is that your update functions may be called multiple times. Saying that side-effecting functions are "dangerous inside atoms" is perhaps a bit strong. They can be dangerous though, and it's best to assume they are. Even when they aren't though (such as when effects are idempotent, meaning you get the same thing called once as you do called multiple times), it is better to keep them free of side effects. This is true both Clojure's refs and atoms, which retry in the case of conflicts. In contast, agents do not have retry semantics, so it's okay to have side effects in functions sent to agents. Since an agent queues up update functions and runs them in order, there is no chance of conflicts occurring and thus no need to have retries.