Search code examples
clojure

fast sorting algorithm for multiple ordered arrays


I have 4 arrays, that are ordered. I'd like to be able to combine them together into a single sorted datastructure and lazily take from that.

Is there an efficient way of doing this?

[1 3 4 6 9 10 15]
[2 3 6 7 8 9 10]
[1 3 6 7 8 9 10]
[1 2 3 4 8 9 10]

=> [1 1 1 2 2 3 3 3 3 4]

Solution

  • Clojure comes with a library of functions producing or operating on lazy sequences, such as map, iterate and take-while. I believe a merge algorithm could be expressed by combining them, something like this.

    (defn insert-into-sorted [dst x]
      (let [x0 (first x)
            a (take-while #(< (first %) x0) dst)
            b (drop (count a) dst)]
        (vec (concat a [x] b))))
    
    (defn next-arrays [arrs]
      (let [[f & r] arrs
            restf (rest f)]
        (if (empty? restf)
          r
          (insert-into-sorted r restf))))
    
    (defn merge-sorted-arrays [arrs]
      (->> arrs
           (filter seq)
           (sort-by first)
           (iterate next-arrays)
           (take-while seq)
           (map ffirst)))
    

    And we can call it like this:

    (merge-sorted-arrays [[1 3 4 6 9 10 15]
                          [2 3 6 7 8 9 10]
                          [1 3 6 7 8 9 10]
                          [1 2 3 4 8 9 10]])
    ;; => (1 1 1 2 2 3 3 3 3 4 4 6 6 6 7 7 8 8 8 9 9 9 9 10 10 10 10 15)
    

    It is true that you could do something like (sort (apply concat ...)) but that could turn out inefficient if you have a lot of data.

    Update: A previous version of this code contained a call to count that limited its applicability to merging sequences of finite length. By changing it to using empty? instead, there is no such limitation and we can now use it to merge sequences of infinite length:

    (take 12 (merge-sorted-arrays [(iterate (partial + 1.1) 1) (iterate (partial + 1.11) 1)]))
    ;; => (1 1 2.1 2.1100000000000003 3.2 3.2200000000000006 4.300000000000001 4.330000000000001 5.4 5.440000000000001 6.5 6.550000000000002)