I'm writing a stateful server in Clojure backed by Neo4j that can serve socket requests, like HTTP. Which means, of course, that I need to be able to start and stop socket servers from within this server. Design-wise, I would want to be able to declare a "service" within this server and start and stop it.
What I'm trying to wrap my mind around in Clojure is how to ensure that starting and stopping these services is thread-safe. This server I'm writing will have NREPL embedded inside it and process incoming requests in a parallel way. Some of these requests will be administrative: start service X, stop service Y. Which opens up the possibility that two start requests come in at the same time.
Stopping needs the same thing, checking a "running" flag and checking and setting it's own "stopping" flag.
I'm trying to reason through all possible combinations of (start) and (stop).
Have I missed anything?
Is there a library for this already? If not, what should a library like this look like? I'll open source it and put it on Github.
This is what I have so far. There's a hole I can see though. What am I missing?
(ns extenium.db
(:require [clojure.tools.logging :as log])
(:import org.neo4j.graphdb.factory.GraphDatabaseFactory))
(def ^:private
db- (ref {:ref nil
:running false
:starting false
:stopping false}))
(defn stop []
(if (or (not (:running (ensure db-)))
(:stopping (ensure db-)))
(throw (IllegalStateException. "Database already stopped or stopping."))
(alter db- assoc :stopping true)))
(log/info "Stopping database")
(.shutdown (:ref db-))
(alter db- assoc :ref nil))
(log/info "Stopped database")
(alter db- assoc :stopping false)))))
In the try block, I log, then call .shutdown, then log again. If the first log fails (I/O exceptions can happen), then (:stopping db-) is set to false, which unblocks it and is fine. .shutdown is a void function from Neo4j, so I don't have to evaluate a return value. If it fails, (:stopping db-) is set to false, so that's fine too. Then I set the (:ref db-) to nil. What if that fails? (:stopping db-) is set to false, but the (:ref db-) is left hanging. So that's a hole. Same case with the second log call. What am I missing?
Would this be better if I just used Clojure's locking primitives instead of a ref dance?
This is actually a natural fit for a simple lock:
(locking x
Here x
is the object on which to synchronize.
To elaborate: starting and stopping a service is a side effect; side effects should not be initiated from inside a transaction, except possibly as Agent actions. Here though locks are exactly what the design calls for. Note that there's nothing wrong in using them in Clojure when they are a good fit for the problem at hand, in fact I would say locking
is the canonical solution here. (See Stuart Halloway's Lancet, introduced in Programming Clojure (1st ed.), for an example of a Clojure library using locks which has seen some widespread use, mostly in Leiningen.)
Update: Adding fail-fast behaviour:
This is still a good fit for a lock, namely a java.util.concurrent.locks.ReentrantLock
(follow link for Javadoc):
(import java.util.concurrent.locks.ReentrantLock)
(def lock (ReentrantLock.))
(defn start []
(if (.tryLock lock)
(finally (.unlock lock)))
will be executed if lock acquisition succeeds; otherwise, (do-other-stuff)
will happen. Current thread will not block in either case.