Is there anyway to deal with RDDs with only a single element (this can sometimes happen for what I am doing)? When that's the case, reduce stops working as the operation requires 2 inputs.
I am working with key-value pairs such as:
(key1, 10),
(key2, 20),
And I want to aggregate their values, so the result should be:
30
But there are cases where the rdd only contain a single key-value pair, so reduce does not work here, example:
(key1, 10)
This will return nothing.
If you do a .values()
before doing reduce
, it should work even if there is only 1 element in the RDD:
from operator import add
rdd = sc.parallelize([('key1', 10),])
rdd.values().reduce(add)
# 10