Search code examples
clojureinfinitelazy-sequencescodata

Terminology for example of codata in Clojure


Imagine the following function to give an infinite lazy sequence of fibonacci in Clojure:

(def fib-seq
  (concat
   [0 1]
   ((fn rfib [a b]
        (lazy-cons (+ a b) (rfib b (+ a b)))) 0 1)))

user> (take 20 fib-seq)
(0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181)

Assuming

  1. We take the pithy definition of codata as being "Codata are types inhabited by values that may be infinite".
  2. That this Clojure example doesn't use a static type system (from core.typed) and so any description of codata is a 'working definition'

My question is - What part of the function above is the 'codata'. Is it the anonymous function? Is it the lazy sequence?


Solution

  • Codata is the dual of data. You work with data via structural induction which means that data is always finite. You work with codata via coinduction which means codata is potentially infinite (but not always).

    In any case, if you can't properly define a finite toString or equality then it'll be codata:

    • Can we define a toString for an infinite streams? No, we'd need an infinite string.
    • Can we always define extensional equality for two infinite streams? No, that'd take forever.

    We can't do the above for streams because they're infinite. But even potentially infinite causes undecidability (i.e. we can't give a definite yes or no for equality or definitely give a string).

    So an infinite stream is codata. I think your second question is more interesting, is the function codata?

    Lispers say that code is data because features like S-expressions allow manipulating the program just like data. Clearly we have already have a string representation of Lisp (i.e. source code). We can also take a program and check if it's made up of equal S-expressions (i.e. compare the AST). Data!

    But let's stop thinking about the symbols that represent our code and instead start thinking about the meaning of our programs. Take the following two functions:

    (fn [a] (+ a a))
    (fn [a] (* a 2))
    

    They give the same results for all inputs. We shouldn't care that one uses * and the other uses +. It's not possible to compute if any two arbitrary functions are extensionally equal unless they only work on finite data (equality is then just comparing input-output tables). Numbers are infinite so that still doesn't solve our above example.

    Now let's think about converting a function to a string. Let's say we had access to the source representation of functions at runtime.

    (defn plus-two [a] (+ a 2))
    (defn plus-four [a] (plus-two (plus-two a)))
    (show-fn plus-four)
    ; "(plus-two (plus-two a))"
    

    Now, referential transparency says we can take function calls and replace them with the function bodies, with the variables substituted and the program always gives the same result. Let's do that for plus-two:

    (defn plus-four [a] (+ (+ a 2) 2))
    (show-fn plus-four)
    ; "(+ (+ a 2) 2)"
    

    Oh... The result is not the same. We broke referential transparency.

    So we also can't define toString or equality for functions. It's because they're codata!

    Here are some resources I found helpful for understanding codata better: