If I have a vector of words for example ["john" "said"... "john" "walked"...] and I want to make a hash map of each word and the number of occurrences of next word for example {"john" {"said" 1 "walked" 1 "kicked" 3}}
The best solution I came up with was recursively walking through a list by index and using assoc to keep updating the hash-map but that seems really messy. Is there a more idiomatic way of doing this?
Given you have words:
(def words ["john" "said" "lara" "chased" "john" "walked" "lara" "chased"])
Use this transformation-fn
(defn transform
[words]
(->> words
(partition 2 1)
(reduce (fn [acc [w next-w]]
;; could be shortened to #(update-in %1 %2 (fnil inc 0))
(update-in acc
[w next-w]
(fnil inc 0)))
{})))
(transform words)
;; {"walked" {"lara" 1}, "chased" {"john" 1}, "lara" {"chased" 2}, "said" {"lara" 1}, "john" {"walked" 1, "said" 1}}
EDIT: You can gain performance using transient hash-maps like this:
(defn transform-fast
[words]
(->> (map vector words (next words))
(reduce (fn [acc [w1 w2]]
(let [c-map (get acc w1 (transient {}))]
(assoc! acc w1 (assoc! c-map w2
(inc (get c-map w2 0))))))
(transient {}))
persistent!
(reduce-kv (fn [acc w1 c-map]
(assoc! acc w1 (persistent! c-map)))
(transient {}))
persistent!))
Obviously the resulting source code doesn't look as nice and such optimization should only happen if it is critical.
(Criterium says it beats Michał Marczyks transform*
being roughly two times as fast on King Lear).