I currently have this line to filter and apply a function to an RDD.
data_to_update.rdd.map(find_differences).filter(lambda row: bool(row))
I want to modify the find_differences
function to also take another argument unique_id
in addition to row
. I'm not exactly sure how to go about modifying this line to do that, or if there's a better way to write it.
Assuming that your cuurent function looks something like this:
def find_differences(row):
# do something
return result
You can create a new function and a partial function that matches your original function:
from functools import partial
def find_differences_id(unique_id, row):
# do something else
return another_result
find_differences = partial(find_differences_id, unique_id)
And map the RDD as you did before.