Clojurescript Hickory on NodeJS (parse HTML to hiccup)

I need a way to parse HTML markup to hiccup on a node.js app written in Clojurescript. On the client side I used hickory for the job, which unfortunately doesn't play nice on Node.js. If any namespace requires hickory.core node refuses to run the app saying

ReferenceError: Node is not defined
    at hickory$core$node_type (/media/lapdaten/ARBEITSSD/dev/violinas_macchiato/target/out/hickory/core.cljs:35:1)
    at Object.<anonymous> (/media/lapdaten/ARBEITSSD/dev/violinas_macchiato/target/out/hickory/core.cljs:39:16)

If I hot-load the library with figwheel while node is already running CIDER gives me code completion for the various hickory functions, but hickory.core/parse-fragment is undefined at runtime (hickory.core/as-hiccup being available for some reason).

This is actually a known problem with hickory because it depends on a browser DOM API, which is unavailable in Node.js. I tried (set! js/DOMParser (.-DOMParser (js/require "xmldom"))) as suggested on GitHub, but I don't actually know where to put that expression. Generally the discussions on GitHub left me without a clue…

Has anyone gotten hickory to work on Node.js? Any other suggestions as to how I may have my app convert HTML to hiccup?

Many thanks in advance!

Oliver

Solution

With hickory not supporting Node.js in a way that I could understand I've recently been looking into native Node.js solutions. Behold posthtml-parser. The nice thing about it is, that the JSON it produces is only one js->clj away from almost exactly being hickory-format, i.e. the following:

(ns utils.phtmltohiccup
  (:require
   ["posthtml-parser" :as phr] ; requiring the shadow-cljs way
   ))
   
(def testhtml
  "<ul class=\"list\" important=\"false\"><li>Hello World</li><li>Hello Again</li></ul>")

(js->clj
 (phr
  testhtml) :keywordize-keys true)

produces:

[{:tag "ul"
  :attrs
  {:class "list"
   :important "false"}
  :content
  [{:tag "li"
    :content
    ["Hello World"]}
   {:tag "li"
    :content
    ["Hello Again"]}]}]

The only difference with respect to proper hickory seems to be the lack of :type keys and the type being assumed to be :element. This structure is highly workable form within Clojurescript as it is. When I do need hiccup I'm now using one of two very naive functions to convert the hickory above to hiccup. Stack-consuming:

(defn parsed-to-hiccup-sc
  ""
  [hickory]
  (map
   (fn [element]
     (if (:tag element)
       (do
         (print (:tag element))
         (let [{:keys [tag attrs content]} element]
           (into [(keyword tag) attrs] (parsed-to-hiccup-sc content))
           ))
       (str element)))
   hickory))

Alternatively I use clojure.walk (which – I assume – is not stack-consuming):

(defn parsed-to-hiccup-ns
  ""
  [hickory]
  (walk/postwalk
   (fn [element]
     (if (:tag element)
       (let [{:keys [tag attrs content]} element]
         (into [(keyword tag) attrs] content))
       (str element)))
   hickory))

For the time being this solution is good enough for my intents and purposes. I will, however, bring this library to the attention of the hickory maintainers. Maybe there turns out to be an easy way to integrate posthtml-parser into hickory for proper Node.js-support.