Long running function in Clojure atom

I have a function that will load lots of users (which takes a while) and store them in an atom. I am wondering if there is any difference between loading the users into a let binding and then resetting the atom or just loading them in the atom reset! function?

(let [all-users (get-users)]
    (reset! users all-users))

(reset! users (get-users))

Solution

They are the same, and here's why

Since reset! is a function, the call to (reset! users (get-users)) will behave as any other function call in Clojure: each of the S-expressions in the call will be evaluated and then passed as arguments to the function. This means that evaluation of (get-users) will happen first, and the result passed to reset!. As such, this will behave exactly as the let form does.

To be contrasted with `swap!`

Where these concerns come into play is with swap!. Because you send swap! a function to be called inside a transaction, you have more control over whether your long running job happens inside or outside the transaction. For example, if you had functions poll-users-updates and update-users-from-poll, you could set the call to the first function to happen either inside or outside the transaction:

; outside the transaction
(swap! users update-users-from-poll (poll-users-updates))
; inside the transaction
(swap! users (fn [users] (update-users-from-poll users (poll-users-updates))))

The second form here is more likely to have to be restarted, since it will take longer for the update function to be run, leaving more time for some other write to the atom to force a restart.

In contrast, the first form would be less likely to force retries, and thus generally preferred. On the other hand, if your poll-users-updates function also needed to operate on the current state of the users data (for instance, to find the timestamp of the most recently updated user, in order to do it's poll more efficiently), then the second approach might be preferred, as it would ensure that you had the most recent value of users in making the poll.

On retries and side-effects

What this highlights with respect to STM is that your update functions may be called multiple times. Saying that side-effecting functions are "dangerous inside atoms" is perhaps a bit strong. They can be dangerous though, and it's best to assume they are. Even when they aren't though (such as when effects are idempotent, meaning you get the same thing called once as you do called multiple times), it is better to keep them free of side effects. This is true both Clojure's refs and atoms, which retry in the case of conflicts. In contast, agents do not have retry semantics, so it's okay to have side effects in functions sent to agents. Since an agent queues up update functions and runs them in order, there is no chance of conflicts occurring and thus no need to have retries.

Long running function in Clojure atom

They are the same, and here's why

To be contrasted with swap!

On retries and side-effects

To be contrasted with `swap!`