Apache Spark has the concept of a Resilient Distributed Dataset.
An RDD is:
It is an immutable distributed collection of objects. Each dataset in RDD is divided into logical partitions, which may be computed on different nodes of the cluster.
Formally, an RDD is a read-only, partitioned collection of records. RDDs can be created through deterministic operations on either data on stable storage or other RDDs. RDD is a fault-tolerant collection of elements that can be operated on in parallel.
Now Clojure has immutable data structures, and running Higher Order Functions in parallel.
I'm aware of Flambo and Sparkling. I'm not looking for an interface, but for an equivalent data structure.
My question is: Is there an equivalent to the Resilient Distributed Dataset in native Clojure?
Well, a normal Clojure map and vector can easilly be processed in sub partitions parallely on multiple cores using core.reducers/fold.
Maps and vectors being immutable by default, this setup seems equivalent to what RDD are.
The difference being, fold will compute on multi-cores, not multiple machines. So its parallel, but not distributed.
Onyx and Storm are distributed computing frameworks which are fully implemented in Clojure and can do what Spark does. These are probably as close as it gets to RDD on spark.