I'm looking for a good reference on
large scale data mining with Clojure
I know of many good clojure programming books (Programming Clojure, Joy of Clojure, ...), and many good data mining text books (mining of massive data sets, managing gigabytes, ...). However I'm not aware of any reference that specifically addresses
large scale data mining with Clojure
The "with clojure" part is rather important to me for the following reasons:
* most theoretical analysis uses big-Oh running time, which ignores constants
* constants matter, if it ends up being a matter of 1 second vs 1 hour (for things that need to be real time)
* or 1 hour vs 1 week (for batch jobs)
In particular, I think there's a lot of interplay between the JVM, Clojure Data Structures, whether data is stored in memory or lazily read from disk -- that can have the "same" algorithm have drastically different running times by "slightly" different implementations.
Thus, my question (all of the above was to avoid being closed by "Check Google"):
what is a good resource on massive data mining with Clojure?
Thanks!
I don't think anyone's yet written a good comprehensive reference. But there is certainly lots of work going on in this space (my own company included!)
Some interesting links to follow up: