Say i have regular python list [1,2]
and I have a rdd with 2 items like [('hi', 'bye'), ('hi', 'bye')]
and I want each item to become
('hi', 'bye', 1)
('hi', 'bye', 2)
Essentially appending each item from the list to each item in the rdd. I feel like this should be simple but I can't think of the logic :/
You can use the zip
method of RDD:
rdd1 = sc.parallelize([('hi', 'bye'), ('hi', 'bye')])
rdd2 = sc.parallelize([1, 2])
rdd3 = rdd1.zip(rdd2).map(lambda x: (x[0][0], x[0][1], x[1]))
rdd3.collect()
# [('hi', 'bye', 1), ('hi', 'bye', 2)]