Search code examples
typesclojurediscriminated-union

Idiomatic way to represent sum type (Either a b) in Clojure


Edited. My question now is: what idiomatic Clojure constructs are usually used instead of sum types in statically types languages? Consensus so far: use protocols if behaviour can be unified, use tagged pairs/maps otherwise, put necessary asserts in pre- and post-conditions.

Clojure provides many ways to express product types: vectors, maps, records..., but how do you represent sum types, also known as tagged unions and variant records? Something like Either a b in Haskell or Either[+A, +B] in Scala.

The first thing which comes to my mind is a map with a special tag: {:tag :left :value a}, but then all the code is going to be polluted with conditionals on (:tag value) and handling special cases if it's not there... What I'd like to ensure, is that :tag is always there, and it can take only one of the specified values, and corresponding value is consistently of the same type/behaviour and cannot be nil, and there is an easy way to see that I took care of all cases in the code.

I can think of a macro in the lines of defrecord, but for the sum types:

; it creates a special record type and some helper functions
(defvariant Either
   left Foo
   right :bar)
; user.Either

(def x (left (Foo. "foo")))   ;; factory functions for every variant
; #user.Either{:variant :left :value #user.Foo{:name "foo"}}
(def y (right (Foo. "bar")))  ;; factory functions check types
; SomeException...
(def y (right ^{:type :bar} ()))
; #user.Either{:variant :right :value ()}

(variants x) ;; list of all possible options is intrinsic to the value
; [:left :right]

Does a thing like this already exist? (Answered: no).


Solution

  • how do you represent sum types, also known as tagged unions and variant records? Something like Either a b in Haskell or Either[+A, +B] in Scala.

    Either has two uses: to return a value of one of two types or to return two values of the same type that should have different semantics based on the tag.

    The first use is only important when using a static type system. Either is basically the minimum solution possible given the constraints of the Haskell type system. With a dynamic type system, you can return values of any type you want. Either is not needed.

    The second use is significant but can be accomplished quite simply in two (or more) ways:

    1. {:tag :left :value 123} {:tag :right :value "hello"}
    2. {:left 123} {:right "hello"}

    What I'd like to ensure, is that :tag is always there, and it can take only one of the specified values, and corresponding value is consistently of the same type/behaviour and cannot be nil, and there is an easy way to see that I took care of all cases in the code.

    If you would like to ensure this statically, Clojure is probably not your language. The reason is simple: expressions do not have types until runtime--until they return a value.

    The reason a macro will not work is that at macro expansion time, you do not have runtime values--and hence runtime types. You have compile-time constructs like symbols, atoms, sexpressions, etc. You can eval them, but using eval is considered bad practice for a number of reasons.

    However, we can do a pretty good job at runtime.

    • What I'd like to ensure, is that :tag is always there,
    • and it can take only one of the specified values
    • and corresponding value is consistently of the same type/behaviour
    • and cannot be nil
    • and there is an easy way to see that I took care of all cases in the code.

    My strategy will be to convert everything that is normally static (in Haskell) to runtime. Let's write some code.

    ;; let us define a union "type" (static type to runtime value)
    (def either-string-number {:left java.lang.String :right java.lang.Number})
    
    ;; a constructor for a given type
    (defn mk-value-of-union [union-type tag value]  
      (assert (union-type tag)) ; tag is valid  
      (assert (instance? (union-type tag) value)) ; value is of correct type  
      (assert value)  
      {:tag tag :value value :union-type union-type}) 
    
    ;; "conditional" to ensure that all the cases are handled  
    ;; take a value and a map of tags to functions of one argument
    ;; if calls the function mapped to the appropriate tag
    (defn union-case-fn [union-value tag-fn]
      ;; assert that we handle all cases
      (assert (= (set (keys tag-fn))
                 (set (keys (:union-type union-value)))))
      ((tag-fn (:tag union-value)) (:value union-value)))
    
    ;; extra points for wrapping this in a macro
    
    ;; example
    (def j (mk-value-of-union either-string-number :right 2))
    
    (union-case-fn j {:left #(println "left: " %) :right #(println "right: " %)})
    => right: 2
    
    (union-case-fn j {:left #(println "left: " %)})
    => AssertionError Assert failed: (= (set (keys tag-fn)) (set (keys (:union-type union-value))))
    

    This code uses the following idiomatic Clojure constructs:

    • Data-driven programming: create a data structure which represents the "type". This value is immutable and first-class and you have the entire language available to implement logic with it. This is something that I don't believe Haskell can do: manipulate types at runtime.
    • Using maps to represent values.
    • Higher-order programming: passing a map of fns to another function.

    You could optionally use protocols if you are using Either for polymorphism. Otherwise, if you are interested in the tag, something of the form {:tag :left :value 123} is the most idiomatic. You will often see something like this:

    ;; let's say we have a function that may generate an error or succeed
    (defn somefunction []
      ...
      (if (some error condition)
        {:status :error :message "Really bad error occurred."}
        {:status :success :result [1 2 3]}))
    
    ;; then you can check the status
    (let [r (somefunction)]
      (case (:status r)
        :error
        (println "Error: " (:message r))
        :success
        (do-something-else (:result r))
        ;; default
        (println "Don't know what to do!")))