Search code examples
node.jsparsingclojurescripthiccup

Clojurescript Hickory on NodeJS (parse HTML to hiccup)


I need a way to parse HTML markup to hiccup on a node.js app written in Clojurescript. On the client side I used hickory for the job, which unfortunately doesn't play nice on Node.js. If any namespace requires hickory.core node refuses to run the app saying

ReferenceError: Node is not defined
    at hickory$core$node_type (/media/lapdaten/ARBEITSSD/dev/violinas_macchiato/target/out/hickory/core.cljs:35:1)
    at Object.<anonymous> (/media/lapdaten/ARBEITSSD/dev/violinas_macchiato/target/out/hickory/core.cljs:39:16)

If I hot-load the library with figwheel while node is already running CIDER gives me code completion for the various hickory functions, but hickory.core/parse-fragment is undefined at runtime (hickory.core/as-hiccup being available for some reason).

This is actually a known problem with hickory because it depends on a browser DOM API, which is unavailable in Node.js. I tried (set! js/DOMParser (.-DOMParser (js/require "xmldom"))) as suggested on GitHub, but I don't actually know where to put that expression. Generally the discussions on GitHub left me without a clue…

Has anyone gotten hickory to work on Node.js? Any other suggestions as to how I may have my app convert HTML to hiccup?

Many thanks in advance!

Oliver


Solution

  • With hickory not supporting Node.js in a way that I could understand I've recently been looking into native Node.js solutions. Behold posthtml-parser. The nice thing about it is, that the JSON it produces is only one js->clj away from almost exactly being hickory-format, i.e. the following:

    (ns utils.phtmltohiccup
      (:require
       ["posthtml-parser" :as phr] ; requiring the shadow-cljs way
       ))
       
    (def testhtml
      "<ul class=\"list\" important=\"false\"><li>Hello World</li><li>Hello Again</li></ul>")
    
    (js->clj
     (phr
      testhtml) :keywordize-keys true)
    

    produces:

    [{:tag "ul"
      :attrs
      {:class "list"
       :important "false"}
      :content
      [{:tag "li"
        :content
        ["Hello World"]}
       {:tag "li"
        :content
        ["Hello Again"]}]}]
    

    The only difference with respect to proper hickory seems to be the lack of :type keys and the type being assumed to be :element. This structure is highly workable form within Clojurescript as it is. When I do need hiccup I'm now using one of two very naive functions to convert the hickory above to hiccup. Stack-consuming:

    (defn parsed-to-hiccup-sc
      ""
      [hickory]
      (map
       (fn [element]
         (if (:tag element)
           (do
             (print (:tag element))
             (let [{:keys [tag attrs content]} element]
               (into [(keyword tag) attrs] (parsed-to-hiccup-sc content))
               ))
           (str element)))
       hickory))
    

    Alternatively I use clojure.walk (which – I assume – is not stack-consuming):

    (defn parsed-to-hiccup-ns
      ""
      [hickory]
      (walk/postwalk
       (fn [element]
         (if (:tag element)
           (let [{:keys [tag attrs content]} element]
             (into [(keyword tag) attrs] content))
           (str element)))
       hickory))
    

    For the time being this solution is good enough for my intents and purposes. I will, however, bring this library to the attention of the hickory maintainers. Maybe there turns out to be an easy way to integrate posthtml-parser into hickory for proper Node.js-support.