Search code examples
clojuredatamodel

Implementing a data model to prevent common errors


There seem to be multiple ways to implement data models in Clojure:

  • ordinary built-in datatypes (maps/lists/sets/vectors)
  • built-in datatypes + meta-data -- for example: (type ^{:type ::mytype} {:fieldname 1})
  • built-in datatypes + special accessor functions (for instance, getting a non-existent key from a map throws an exception, instead of silently returning nil)
  • deftype
  • defstruct
  • defrecord
  • defprotocol

We've reached the point where maps/lists are no longer working well for us -- we run into lots of errors that pre-conditions/post-conditions could easily catch, but take a very long time to hunt down otherwise (and it's hard to write effective pre/post-conditions for functions that accept nested maps/lists/vectors) -- but we're not sure which of the above to choose from.

We have three major goals:

  • write idiomatic Clojure code
  • avoid spending large amounts of time hunting down stupid type errors
  • have confidence in our ability to change/refactor code with silently breaking anything

How can we harness the power of Clojure to help us?


Solution

  • Clojure culture is strongly supportive of the raw data types. Justifiably so. But explicit types can be useful. When your plain datatypes get sufficiently complex and specific, you essentially have an implicit dataype without the specification.

    Rely on constructors. This sounds a bit dirty, in an OOP kind of way, but a constructor is nothing more than a function that creates your data type safely and conveniently. A drawback of plain data structures is that they encourage creating the data on the fly. So, instead of calling (myconstructor ...), I attempt to compose my data directly. And with much potential for error, as well as problems if I need to change the underlying data type.

    Records are the sweet spot. With all the fuss about raw data types, it's easy to forget that records do a lot of things that maps can do. They can be accessed the same way. You can call seq on them. You can destructure them the same way. The vast majority of functions that expect a map will accept a record as well.

    Meta data will not save you. My main objection to relying on meta data is that it isn't reflected in equality.

    user> (= (with-meta [1 2 3] {:type :A})  (with-meta [1 2 3] {:type :B}))
    true
    

    Whether that's acceptable or not is up to you, but I'd worry about this introducing new subtle bugs.


    The other dataype options:

    • deftype is only for low level work in creating new basic or special purpose data structures. Unlike defrecord, it doesn't bring all of the clojure goodness along with it. For most work, it isn't necessary or adviseable.
    • defstruct should be deprecated. When Rich Hickey introduced types and protocols, he essentially said that defrecord should be preferred evermore.

    Protocols are very useful, even though they feel like a bit of a departure from the (functions + data) paradigm. If you find yourself creating records, you should consider defining protocols as well.

    EDIT: I discovered another advantage to plain datatypes that hadn't been apparent to me earlier: if you're doing web programming, the plain datatypes can be converted to and from JSON efficiently and easily. (Libraries for doing this include clojure.data.json, clj-json, and my favourite, cheshire). With records and datatypes, the task is considerably more annoying.