How to use correctly mapPartitions function

I'm doing a program with big data and that's why I'm using Spark and Scala. I need to partition the database and for this I use

var data0 = conf.dataBase.repartition (8) .persist (StorageLevel.MEMORY_AND_DISK_SER)

but then I need to do things in the partition before proceeding to work with the piece of database corresponding to that partition and for that I use

var tester = data0.mapPartitions {x =>
   configFuzzyPredProblem ()
   Strategy.getStrategy.executeStrategy (conf.iterByRun, 5, GeneratorType.HillClimbing)
 } .persist (StorageLevel.MEMORY_AND_DISK_SER)

Within the method executeStrategy() I use the database but I do not know if it is the global one or the one corresponding to that partition. How can I know which one I am using and then only perform the partition processing with the database of that partition?

Solution

Here is a simple example using mapPartitionsWithIndex which follows the same rules of mapPartitions - excluding the index aspect.

You can see that inside mapPartitions you need to process an interable, an Interator Int in this example. In this case 3 partitions are processed, in your case 8, with either some entries or possibly zero entries.

val rdd1 = sc.parallelize(List(1,2,3,4,5,6,7,8,9,10), 3)
def myfunc(index: Int, iter: Iterator[Int]) : Iterator[String] = {
    iter.map(x => index + "," + x)
}
val rdd2 = rdd1.mapPartitionsWithIndex(myfunc)

I cannot see inside your function but I assume it is OK and it will process a partition - part of your database.