reduceByKey with multiple values to sum in pyspark

I have a csv file that looks like the following:

Philadelphia 30 40
Philadelphia 45 60
Philadelphia  3  4
Wilmington   10 20
Wilmington   15 30
Wilmington    1  2

I read it into pyspark by using a map function to parse each line. I now have my data (called input_data) looking like this:

("Philadelphia", 30, 40)
("Philadelphia", 45, 60)
("Philadelphia", 3, 4)
("Wilmington", 10, 20)
("Wilmington", 15, 30)
("Wilmington", 1, 2)

I am trying to use this RDD and reduceByKey to group by city and sum the values in the tuple. I am trying to make something like this:

("Philadelphia", 78, 104)
("Wilmington", 26, 52)

However, when I try to run a reduceByKey line like this

temp = input_data.reduceByKey(lambda x, y: (x[1] + y[1]))

I receive an error indicating that there are too many values to unpack (expected 2). I am trying to better understand what is going on here. Why are the numeric values not grouping as expected? Additional context, so I can understand better would be greatly appreciated!

Solution

Your RDD (input_data):

('Philadelphia', 30, 40)
('Philadelphia', 45, 60)
('Philadelphia', 3, 4)
('Wilmington', 10, 20)
('Wilmington', 15, 30)
('Wilmington', 1, 2)

Try this:

input_data \
.map(lambda row: (row[0], (row[1],row[2]))) \
.groupByKey() \
.map(lambda row: (
        row[0],
        sum([tup[0] for tup in list(row[1])]),
        sum([tup[1] for tup in list(row[1])])
    )
).collect()

Output:

('Wilmington', 26, 52)
('Philadelphia', 78, 104)