I have two rdd
rdd1 =[('1', 3428), ('2', 2991), ('3', 2990), ('4', 2883), ('5', 2672), ('5', 2653)]
rdd2 = [['1', 'Toy Story (1995)'], ['2', 'Jumanji (1995)'], ['3', 'Grumpier Old Men (1995)']]
I want to perform an operation to relace first rdd's first element with second rdd's second element
My final result will be like this
[(''Toy Story (1995)'', 3428), ('Jumanji (1995)', 2991), ('Grumpier Old Men (1995)', 2990)]
Please refer me a way to perform this
Use join and map:
rdd1.join(rdd2).map(lambda x: (x[1][1], x[1][0])).collect()
#[('Toy Story (1995)', 3428),
# ('Jumanji (1995)', 2991),
# ('Grumpier Old Men (1995)', 2990)]