Search code examples
xmlclojure

how to remove empty xml tags with clojure.data.xml?


Given a namespaced xml (ignored in this ex)

<foo>
    <name>John</name>
    <address>1 hacker way</address>
    <phone></phone>
    <school>
        <name></name>
        <state></state>
        <type></type>
    </school>
    <college>
        <name>mit</name>
        <address></address>
        <state></state>
    </college>
</foo>

how would you write a function, remove-empty-tags with clojure.data.xml to return the following?

<foo>
  <name>John</name>
  <address>1 hacker way</address>
  <college> 
    <name>mit</name>
  </college>
</foo>

My solution so far is incomplete and looks like some recursion might help:

(require '[clojure.data.xml :as xml])

(defn- child-element? [e]
  (let [content (:content e)]
    (and (= (count content)
            (count (filter #(instance? clojure.data.xml.node.Element %) content))))))


(defn remove-empty-tags
  [xml-data]
  (let [empty-tags? #(or (empty? %) (-> % .toString blank?))]
    (reduce (fn [col e]
               (if-not (empty-tags? (:content e))
                 (merge col e)
                  col)))
            xml-data))

(def body (slurp "sample.xml")) ;; the above xml
(def xml-data (-> (xml/parse (java.io.StringReader. body)) :content))

(remove-empty-tags xml-data)

This returns, after converting to xml:

<foo>
    <name>John</name>
    <address>1 hacker way</address>
    <school>
        <name/>
        <state/>
    </school>
    <college>
        <name>mit</name>
        <address/>
        <state/>
    </college>
</foo>

Clearly, this function needs to be recursive to remove empty child nodes using child-element?.

Suggestions?


Solution

  • I was able to get to this with a combination of recursion and reduce (my original partial answer, complete). The key was to pass head of each node in recursion, so reduce can attach the transformation of child nodes to the head.

    (defn- child-element? [e]
        (let [content (:content e)]
          (and (= (count content)
                  (count (filter #(instance? clojure.data.xml.node.Element %) content))))))
    
    (defn- empty-element? [e]
      (println "empty-element" e)
      (or (empty? e) (-> e .toString blank?)))
    
    (defn element? [e]
      (and (instance? clojure.lang.LazySeq e)
           (instance? clojure.data.xml.node.Element (first e))))
    
    (defn remove-empty-elements!
      "Remove empty elements (and child elements) in an xml"
      [head xml-data]
      (let [data (if (seq? xml-data) xml-data (:content xml-data))
            rs (reduce (fn [col e]
                  (let [content (:content e)]
                    (cond
                      (empty-element? content)
                      col
    
                      (and (not (element? content)) (not (every? empty-element? content)))
                      (merge col e)
    
                      (and (element? content) (every? true? (map #(empty-element? (:content %)) content)))
                      col
    
                      (and (child-element? content))
                      (let [_head (xml/element (:tag e) {})]
                        (merge col (remove-empty-element! _head content)))
    
                      :else col)))
                []
                data)]
        (assoc head :content rs)))
    
    
    ;; test
    (remove-empty-element! xml-data (xml/element (:tag xml-data) {}))