Remove RDD values with condition

I have an RDD like this:

[ (Person 1, [Cat, Dog, Cow]), (Person 2, [Cat]), (Person 3,[Cow, Chicken])]

And I have a list of frequent animals:

freq_animals=[Cat, Dog]

I want to delete in my RDD the values for each person that are not in the list of frequent animals i.e. Output would be:

[ (Person 1, [Cat, Dog]), (Person 2, [Cat]), (Person 3,[])]

Any idea how I could change my RDD? Thank you!

Solution

You can do mapValues using a list comprehension:

rdd = sc.parallelize([("Person 1", ["Cat", "Dog", "Cow"]), ("Person 2", ["Cat"]), ("Person 3", ["Cow", "Chicken"])])

freq_animals = ["Cat", "Dog"]

rdd2 = rdd.mapValues(lambda v: [i for i in v if i in freq_animals])

print(rdd2.collect())
# [('Person 1', ['Cat', 'Dog']), ('Person 2', ['Cat']), ('Person 3', [])]

How to pick just one item from a generator?
Python: Get unbound class method
global frame vs. stack frame
How to generate a snapshot of a field in a time step with VTK and Python
How to read the first letter from the last line in a txt file in python
How to control scientific notation in matplotlib?
Streamlit multiselect, if I don't select anything, doesn't show data frame
How to extend a class in python?
Is there a standard location to store function cache files in Python?
C++ function (Vectors) wrapped with Cython being around 4 times slower than equivalent Cython function (NumPy Arrays MemoryViews), with large arrays
Error in anyjson setup command: use_2to3 is invalid
Send paid media aiogram 3.10
Is there a workaround for adding Microsoft Word footnotes dynamically in Python?
Training a Keras model to identify leap years
Overload a method based on init variables
How do I create a constant in Python?
What is gettext_lazy on django for?
Pydantic - parse a list of objects from YAML configuration file
How to print stdout excerpt in IPython
What is the difference between Spyder and Jupyter?
How do I create a multiline plot using seaborn?
How to read the request body using orjson library in FastAPI?
Does iPython have built-in support for viewing a variable in pager?
cropping the image by removing the white spaces
Verbose level with argparse and multiple -v options
How to return data in JSON format using FastAPI?
Rounding a rational number to the nearest integer, with half-up
Python inspector ignores property return hint when using TypeVar
How to highlight values per column in Polars
Create arbitrary multidimensional zeros array