Search code examples
clojurelazy-sequences

Partitioning across partitions in Clojure?


Here are some values. Each is a sequence of ascending (or otherwise grouped) values.

(def input-vals [[[1 :a] [1 :b] [2 :c] [3 :d] [3 :e]]
           [[1 :f] [2 :g] [2 :h] [2 :i] [3 :j] [3 :k]]
           [[1 :l] [3 :m]]])

I can partition them each by value.

=> (map (partial partition-by first) input-vals)
   ((([1 :a] [1 :b]) ([2 :c]) ([3 :d] [3 :e])) (([1 :f]) ([2 :g] [2 :h] [2 :i]) ([3 :j] [3 :k])) (([1 :l]) ([3 :m])))

But that gets me 3 sequences of partitions. I want one single sequence of partitioned groups.

What I want to do is return a single lazy sequence of (potentially) lazy sequences that are the respective partitions joined. e.g. I want to produce this:

((([1 :a] [1 :b] [1 :f] [1 :l]) ([2 :c] [2 :g] [2 :h] [2 :i]) ([3 :d] [3 :e] [3 :j] [3 :k] [3 :m])))

Note that not all values appear in all sequences (there is no 2 in the third vector).

This is of course a simplification of my problem. The real data is a set of lazy streams coming from very large files, so nothing can be realised. But I think the solution for the above question is the solution for my problem.

Feel free to edit the title, I wasn't quite sure how to express it.


Solution

  • Try this horror:

    (defn partition-many-by [f comp-f s]
      (let [sorted-s (sort-by first comp-f s)
            first-list (first (drop-while (complement seq) sorted-s))
            match-val (f (first first-list))
            remains (filter #(not (empty? %)) 
                            (map #(drop-while (fn [ss] (= match-val (f ss))) %) 
                                 sorted-s))]
        (when match-val
          (cons
            (apply concat
              (map #(take-while (fn [ss] (= match-val (f ss))) %)
                   sorted-s))
            (lazy-seq (partition-many-by f comp-f remains))))))
    

    It could possibly be improved to remove the double value check (take-while and drop-while).

    Example usage:

    (partition-many-by identity [[1 1 1 1 2 2 3 3 3 3] [1 1 2 2 2 2 3] [3]])
    
    => ((1 1 1 1 1 1) (2 2 2 2 2 2) (3 3 3 3 3 3))