Search code examples
clojure

split a sequence by delimiter in clojure?


Say I have a sequence in clojure like

'(1 2 3 6 7 8)

and I want to split it up so that the list splits whenever an element divisible by 3 is encountered, so that the result looks like

'((1 2) (3) (6 7 8))

(EDIT: What I actually need is

[[1 2] [3] [6 7 8]]

, but I'll take the sequence version too : )

What is the best way to do this in clojure?

partition-by is no help:

(partition-by #(= (rem % 3) 0) '(1 2 3 6 7 8))
; => ((1 2) (3 6) (7 8))

split-with is close:

(split-with #(not (= (rem % 3) 0)) '(1 2 3 6 7 8))
; => [(1 2) (3 6 7 8)]

Solution

  • This is an interesting problem. I recently added a function split-using to the Tupelo library, which seems like a good fit here. I left the spyx debug statements in the code below so you can see how things progress:

    (ns tst.clj.core
      (:use clojure.test tupelo.test)
      (:require
        [tupelo.core :as t]  ))
    (t/refer-tupelo)
    
    (defn start-segment? [vals]
      (zero? (rem (first vals) 3)))
    
    (defn partition-using [pred vals-in]
      (loop [vals   vals-in
             result []]
        (if (empty? vals)
          result
          (t/spy-let [
              out-first               (take 1 vals)
              [out-rest unprocessed]  (split-using pred (spyx (next vals)))
              out-vals                (glue out-first out-rest)
              new-result              (append result out-vals)]
            (recur unprocessed new-result)))))
    

    Which gives us output like:

    out-first => (1)
    (next vals) => (2 3 6 7 8)
    [out-rest unprocessed] => [[2] (3 6 7 8)]
    out-vals => [1 2]
    new-result => [[1 2]]
    out-first => (3)
    (next vals) => (6 7 8)
    [out-rest unprocessed] => [[] [6 7 8]]
    out-vals => [3]
    new-result => [[1 2] [3]]
    out-first => (6)
    (next vals) => (7 8)
    [out-rest unprocessed] => [[7 8] ()]
    out-vals => [6 7 8]
    new-result => [[1 2] [3] [6 7 8]]
    
    (partition-using start-segment? [1 2 3 6 7 8]) => [[1 2] [3] [6 7 8]]
    

    or for a larger input vector:

    (partition-using start-segment? [1 2 3 6 7 8 9 12 13 15 16 17 18 18 18 3 4 5])
       => [[1 2] [3] [6 7 8] [9] [12 13] [15 16 17] [18] [18] [18] [3 4 5]]
    

    You could also create a solution using nested loop/recur, but that is already coded up in the split-using function:

    (defn split-using   
      "Splits a collection based on a predicate with a collection argument.
      Finds the first index N such that (pred (drop N coll)) is true. Returns a length-2 vector
      of [ (take N coll) (drop N coll) ]. If pred is never satisified, [ coll [] ] is returned."
      [pred coll]
      (loop [left  []
             right (vec coll)]
        (if (or (empty? right) ; don't call pred if no more data
                (pred right))
          [left right]
          (recur  (append left (first right))
                  (rest right)))))
    

    Actually, the above function seems like it would be useful in the future. partition-using has now been added to the Tupelo library.