Search code examples
ignite

IgniteSqlRDD has only one partition


I brief the code of IgniteRDD,

class IgniteSqlRDD[R: ClassTag, T, K, V](
    ic: IgniteContext,
    cacheName: String,
    cacheCfg: CacheConfiguration[K, V],
    qry: Query[T],
    conv: (T) ⇒ R,
    keepBinary: Boolean
) extends IgniteAbstractRDD[R, K, V](ic, cacheName, cacheCfg, keepBinary) {
    override def compute(split: Partition, context: TaskContext): Iterator[R] = {
        new IgniteQueryIterator[T, R](ensureCache().query(qry).iterator(), conv)
    }

    override protected def getPartitions: Array[Partition] = {
        Array(new IgnitePartition(0))
    }
}

I noticed that it has hard coded the number of partitions which is only one, this will significantly reduce the performance with parallelism being one . I would ask why it is so designed, thanks!


Solution

  • IgniteSqlRDD is an internal implementation used only for result sets which a are fully fetched to the driver, so this RDD is not distributed. Thus there is only one partition.

    IgniteRDD on the other hand represents an Ignite cache which is distributed.