Search code examples
functional-programmingclojureclojure-core.logic

How to Extract Clojure string to enumerable of strings?


Suppose I have a simple string that I want to parse into array of string:

"add (multiply (add 1 2) (add 3 4)) (add 5 6)"

How do I parse it into 3 strings (based on outer parentheses):

add
(multiply (add 1 2) (add 3 4))
(add 5 6)

With my OOP mind, I think I need a for loop index and if else statement to do this.

I have tried parse it with string split, however I got:

command
(multiply
1
(add
3
2))
(add
3
4)

which is not what I expected


Solution

  • Either you can use the build-in LispReader

    (import '[clojure.lang LispReader LineNumberingPushbackReader])
    (import '[java.io PushbackReader StringReader])
    
    (defn could-read? [pr]
      (try
        (LispReader/read pr nil)
        true
        (catch RuntimeException e false)))
    
    (defn paren-split2 [s]
      (let [sr (StringReader. s)
            pr (LineNumberingPushbackReader. sr)
            inds (loop [result [0]]
                   (if (could-read? pr)
                     (recur (conj result (.getColumnNumber pr)))
                     result))
            len (count s)
            bounds (partition 2 1 inds)]
        (for [[l u] bounds
              :let [result (clojure.string/trim (subs s l (min len u)))] :when (seq result)]
          result)))
    
    (paren-split2 "add (    multiply (   add      1 2) (add 3 4))   (add 5   6  )")
    ;; => ("add" "(    multiply (   add      1 2) (add 3 4))" "(add 5   6  )")
    

    or you can hand-code a parser:

    (def conj-non-empty ((remove empty?) conj))
    
    (defn acc-paren-split [{:keys [dst depth current] :as state} c]
      (case c
        \( (-> state
               (update :depth inc)
               (update :current str c))
        \) (if (= 1 depth)
             {:depth 0 :dst (conj-non-empty dst (str current c)) :current ""}
             (-> state
                 (update :depth dec)
                 (update :current str c)))
        \space (if (zero? depth)
                 {:depth 0 :dst (conj-non-empty dst current) :current ""}
                 (update state :current str c))
        (update state :current str c)))
    
    (defn paren-split [s]
      (:dst (reduce acc-paren-split
                    {:dst []
                     :depth 0
                     :current ""}
                    s)))
    
    (paren-split "add (    multiply (   add      1 2) (add 3 4))   (add 5   6  )")
    ;; => ["add" "(    multiply (   add      1 2) (add 3 4))" "(add 5   6  )"]
    

    Note: Either approach will preserve spaces in the input strings.