Search code examples
functional-programmingclojurepipelinedataflowvector-processing

Pluggable vector processing units in Clojure


I'm developing some simulation software in Clojure that will need to process lots of vector data (basically originating as offsets into arrays of Java floats, length typically in 10-10000 range). Large numbers of these vectors will need to go through various processing steps - e.g. normalising the vectors, concatenating together two streams of vectors, calculating a moving average etc.

Rather than doing everything in an imperative style, I was hoping to do was create a more functional-style Clojure solution that would do the following:

  • allow any vector function to be turned into a pluggable module, e.g. (def module-a (make-module some-function))
  • allow these modules to be composed in pipelines, e.g. (def combined-module (combine-in-series module-a module-b)) would feed the output of module-a into the input of module-b
  • allow auxillary functions to access state stored within a given module, e.g. (get-moving-average some-moving-average-module), which would need to work even if some-moving-average-module is embedded deep within a combined pipeline
  • hide any boilerplate code behind the scenes, e.g. allocating sufficiently large temporary arrays for vector calculation.

Does this sound like a sensible approach?

If so, any implementation hints or libraries that might help?


Solution

  • In a functional language, everything is dataflow. You can use functions as your module concept.

    To address each of your use-cases:

    • A pluggagble module is a Clojure function that takes a single argument that is the state of your data vector. e.g. (def module-a some-function) To allow for easy extension by modules, I suggest using a Clojure map as your state, where one field is your array of floats.
    • Composing modules is function composition. e.g. (def combined-module (compose module-a module-b)
    • Auxiliary functions are accessor functions, extracting state from your data. e.g. If your data is a Clojure map with a :moving-average field, then the keyword :moving-average is your accessor function. State is not stored in modules.
    • Boilerplate code is hidden in the implementation of your functions, which can be declared anywhere, possibly in another file and namespace.