Just for transformation, map and foreachRDD can achieve the same goal, but which one is more efficient? And why?
for example,for a DStream[Int]:
val newDs1=Ds.map(x=> x+1)
val newDs2=Ds.foreachRDD (rdd=>rdd.map(x=> x+1))
I know foreachRDD will operate on the RDD directly, but map seams to transform DStream to RDD first(not sure), thus foreachRDD seams more efficient than map. However, map is a Transformations Operation while foreachRDD is a Output Operations. Thus, map should be more efficient than foreachRDD while doing transformation. Anybody knows which one is right and why? Thanks for any reply.
Add one more comparison:
val newDS3=Ds.transform (rdd=>rdd.map(x=> x+1))
which is more efficient for transformation?
You could answer this question yourself if you checked the types. foreachRDD
is Unit
so what you have is:
val newDs2: Unit = Ds.foreachRDD (rdd=>rdd.map(x=> x+1))
You not only don't have DStream[_]
, but internal map
is never executed (it is lazy).
Following two:
Ds.map(x=> x+1)
Ds.transform (rdd=>rdd.map(x=> x+1))
are identical in terms of execution, so it doesn't make sense to use the latter one, which is unnecessarily verbose.