multithreading clojure parallel-processing singleton stm

Canonical Way to Ensure Only One Instance of a Service Is Running / Starting / Stopping in Clojure?

I'm writing a stateful server in Clojure backed by Neo4j that can serve socket requests, like HTTP. Which means, of course, that I need to be able to start and stop socket servers from within this server. Design-wise, I would want to be able to declare a "service" within this server and start and stop it.

What I'm trying to wrap my mind around in Clojure is how to ensure that starting and stopping these services is thread-safe. This server I'm writing will have NREPL embedded inside it and process incoming requests in a parallel way. Some of these requests will be administrative: start service X, stop service Y. Which opens up the possibility that two start requests come in at the same time.

Starting should synchronously check a "running" flag and a "starting" flag and fail if either are set. In the same transaction, the "starting" flag should be set.
After the "starting" flag is set, the transaction closes. That makes the "starting" flag visible to other transactions.
Then the (start) function actually starts the service.
If (start) succeeds, the "running" and "starting" flags are synchronously set.
If (start) fails, the "starting" flag is set and the exception is returned.

Stopping needs the same thing, checking a "running" flag and checking and setting it's own "stopping" flag.

I'm trying to reason through all possible combinations of (start) and (stop).

Have I missed anything?

Is there a library for this already? If not, what should a library like this look like? I'll open source it and put it on Github.

Edit:

This is what I have so far. There's a hole I can see though. What am I missing?

(ns extenium.db
  (:require [clojure.tools.logging :as log])
  (:import org.neo4j.graphdb.factory.GraphDatabaseFactory))

(def ^:private
  db- (ref {:ref nil
            :running false
            :starting false
            :stopping false}))

(defn stop []
  (dosync
   (if (or (not (:running (ensure db-)))
           (:stopping (ensure db-)))
     (throw (IllegalStateException. "Database already stopped or stopping."))
     (alter db- assoc :stopping true)))
  (try
    (log/info "Stopping database")
    (.shutdown (:ref db-))
    (dosync
     (alter db- assoc :ref nil))
    (log/info "Stopped database")
    (finally
      (dosync
       (alter db- assoc :stopping false)))))

In the try block, I log, then call .shutdown, then log again. If the first log fails (I/O exceptions can happen), then (:stopping db-) is set to false, which unblocks it and is fine. .shutdown is a void function from Neo4j, so I don't have to evaluate a return value. If it fails, (:stopping db-) is set to false, so that's fine too. Then I set the (:ref db-) to nil. What if that fails? (:stopping db-) is set to false, but the (:ref db-) is left hanging. So that's a hole. Same case with the second log call. What am I missing?

Would this be better if I just used Clojure's locking primitives instead of a ref dance?

Solution

This is actually a natural fit for a simple lock:

(locking x
  (do-stuff))

Here x is the object on which to synchronize.

To elaborate: starting and stopping a service is a side effect; side effects should not be initiated from inside a transaction, except possibly as Agent actions. Here though locks are exactly what the design calls for. Note that there's nothing wrong in using them in Clojure when they are a good fit for the problem at hand, in fact I would say locking is the canonical solution here. (See Stuart Halloway's Lancet, introduced in Programming Clojure (1st ed.), for an example of a Clojure library using locks which has seen some widespread use, mostly in Leiningen.)

Update: Adding fail-fast behaviour:

This is still a good fit for a lock, namely a java.util.concurrent.locks.ReentrantLock (follow link for Javadoc):

(import java.util.concurrent.locks.ReentrantLock)

(def lock (ReentrantLock.))

(defn start []
  (if (.tryLock lock)
    (try
      (do-stuff)
      (finally (.unlock lock)))
    (do-other-stuff)))

(do-stuff) will be executed if lock acquisition succeeds; otherwise, (do-other-stuff) will happen. Current thread will not block in either case.