I brief the code of IgniteRDD,
class IgniteSqlRDD[R: ClassTag, T, K, V](
ic: IgniteContext,
cacheName: String,
cacheCfg: CacheConfiguration[K, V],
qry: Query[T],
conv: (T) ⇒ R,
keepBinary: Boolean
) extends IgniteAbstractRDD[R, K, V](ic, cacheName, cacheCfg, keepBinary) {
override def compute(split: Partition, context: TaskContext): Iterator[R] = {
new IgniteQueryIterator[T, R](ensureCache().query(qry).iterator(), conv)
}
override protected def getPartitions: Array[Partition] = {
Array(new IgnitePartition(0))
}
}
I noticed that it has hard coded the number of partitions which is only one, this will significantly reduce the performance with parallelism being one . I would ask why it is so designed, thanks!
IgniteSqlRDD
is an internal implementation used only for result sets which a are fully fetched to the driver, so this RDD is not distributed. Thus there is only one partition.
IgniteRDD
on the other hand represents an Ignite cache which is distributed.