Search code examples
haskellclojurefunctional-programminglazy-evaluationcartesian-product

Cartesian Product of a finite sequence of potentially infinite sequences


The Problem

I need to create a function that, when given a finite sequence of potentially infinite sequences, it produces the sequence that is their "cartesian product".

i.e. given the sequence

'((1 2) (3 4))

the function produces (some ordering of):

'((1 3) (1 4) (2 3) (2 4)

Importantly, for every p in the list of cartesian products ps, there must be some natural number n such that (= p (last (take n ps))). Or, informally, you only need to iterate through the sequence a finite amount to reach any element in it.

This condition becomes important when dealing with infinite lists.

Solution in Haskell

In Haskell, this is how I would have done it:

interleave           :: [a] -> [a] -> [a]
interleave [] ys     = ys
interleave (x:xs) ys = x : interleave ys xs

combine :: [[a]] -> [[a]]
combine = foldr prod [[]]
  where
    prod xs css = foldr1 interleave [ [x:cs | cs <- css] | x <- xs]

And calling it you get the following:

combine [[0..] [0..]] = [[0,0,0],[1,0,0],[,1,0],[2,0,0],[0,0,1],[1,1,0],...

Solution in Clojure

And so I attempted to replicate this in Clojure, like so, (It's pretty much a direct translation):

(defn interleave
  "Returns a lazy sequence of the interleavings of sequences `xs` and `ys`
  (both potentially infinite), leaving no elements discarded."
  [xs ys]
  (lazy-seq
    (if-let [[x & xs*] (seq xs)]
      (cons x (interleave ys xs*))
      ys)))

(defn interleave*
  "Converts a sequence of potentially infinite sequences into its lazy
  interleaving."
  [xss]
  (lazy-seq
    (when-let [[xs & xss*] (seq xss)]
      (interleave xs (interleave* xss*)))))

(defn combine
  "Takes a finite sequence of potentially infinite sequences, and combines
  them to produce a possibly infinite sequence of their cartesian product."
  [xss]
  (if-let [[xs & xss*] (seq xss)]
    (interleave*
      (for [x xs]
        (for [cs (combine xss*)]
          (lazy-seq (cons x cs)))))
    '(()) ))

But when I run:

(take 1 (combine [(range) (range)]))

I get:

StackOverflowError   cfg.util/combine/iter--3464--3468/fn--3469/fn--3470/iter--3471--3475/fn--3476

So, how do I make it lazy enough, so as to avoid the stack overflow? Really, I don't understand how Clojure's lazy sequence model works which is the main problem.


Solution

  • I think your solution may be algorithmically intractable, reconstructing the sub-sequences time and again, much as the simple Fibonacci function:

    (defn fib [n]
      (case n
        (0N 1N) n
        (+ (fib (- n 1)) (fib (- n 2)))))
    

    ... recomputes its precedents.

    In any event, the search for [100 10] in the cartesian product of (range) and (range):

    (first (filter #{[100 10]} (combine [(range) (range)])))
    

    ... does not return in a reasonable time.


    I can offer you a faster though far less elegant solution.

    First, a couple of utilities:

    Something from @amalloy to compute the Cartesian product of finite sequences:

    (defn cart [colls]
      (if (empty? colls)
        '(())
        (for [x (first colls)
              more (cart (rest colls))]
          (cons x more))))
    

    A function adapted from the Clojure Cookbook to map the values of a map:

    (defn map-vals [f m] (zipmap (keys m) (map f (vals m))))
    

    Now for the function we want, which I've called enum-cart, as it enumerates the Cartesian product even of infinite sequences:

    (defn enum-cart [colls]
      (let [ind-colls (into (sorted-map) (map-indexed (fn [n s] [n (seq s)]) colls))      
            entries ((fn tins [ss] (let [ss (select-keys ss (map key (filter val ss)))]
                                     (lazy-seq
                                       (if (seq ss)
                                         (concat
                                           (map-vals first ss)
                                           (tins (map-vals next ss)))))))
                         ind-colls)
            seens (reductions
                    (fn [a [n x]] (update-in a [n] conj x))
                    (vec (repeat (count colls) []))
                    entries)]
        (mapcat
          (fn [sv [n x]] (cart (assoc sv n [x])))
          seens entries)))
    

    The idea is to generate an indexed sequence of entries, going round the non-exhausted sequences. From this we generate a companion sequence of what we have already seen from each sequence. We pairwise combine these two, generating the free cartesian product of the new element with what we have of the other sequences. The answer is the concatenation of these free products.

    For example

    (enum-cart [(range 3) (range 10 15)])
    

    ... produces

    ((0 10)
     (1 10)
     (0 11)
     (1 11)
     (2 10)
     (2 11)
     (0 12)
     (1 12)
     (2 12)
     (0 13)
     (1 13)
     (2 13)
     (0 14)
     (1 14)
     (2 14))
    

    And

    (first (filter #{[100 10]} (enum-cart [(range) (range)])))
    ;(100 10)
    

    ... returns more or less instantly.


    Notes

    • Is this better done in Knuth or elsewhere? I don't have access to it.
    • The last non-exhausted sequence need not be kept, as there is nothing else to use it.