python apache-spark dictionary pyspark rdd

Checking items in a list against a pyspark RDD

I have the following pyspark RDD with Ids and their counts:

rdd = [('12', 560), ('34', 900), ('56', 800), ('78', 100), ('910', 220), ('125', 410), ('111', 41), etc.]

And I have a regular list:

id_list = ['12', '125', '78']

I want a new list of key, value pairs of 'id' from id_list and 'counts' from rdd.

So expected output:

new_list = [('12', 560), ('125', 410), ('78', 100)]

If rdd was a python dictionary, I could loop over the id_list, check to see if it's in the dictionary and return a new list with key and counts. But I'm lost on how I could do this with an RDD. Please advise.

I could potentially try to convert the RDD into a dictionary but that would defeat the purpose of using spark.

Solution

You can filter the RDD using a lambda function which checks if the key is in id_list:

rdd2 = rdd.filter(lambda x: x[0] in id_list)

Change value in Ini to empty using Python configparser
Script works with error line but won't run corectly when bad line is removed
FFMPEG not saving logs when converting to audio format
Numerically obtaining response of a damped driven oscillator gives peak at wrong frequency
What is the deal with the pony in Python community?
'Area' object is not callable
ValueError: Attribute Users.request is required
How to place class in its own file when it appears to be inheriting from an instance?
Python 3.6 with pony 0.7 gives error on commit to oracle db
Why pyqt tablewigdet is only displaying row number and no data?
How to check if a value exists in a dictionary?
OpenCV: draw a rectangle around a region
Python, Tkinter, trying to pull random numbers from a list based off user input for number and have results open in mew window
How to remove repeated elements in a vector, similar to 'set' in Python
Understanding descriptor protocol for 'wrapper-descriptor' itself
Passing a NumPy 3d array to a C function with a triple pointer as an argument
How do I create variable variables?
Turn a tf.data.Dataset to a jax.numpy iterator
Django 5.1 + Postgresql (debian server)
Passing in arguments to Dependency function makes it recognized as a Query Parameter
How to pass parameters to an endpoint using `add_route()` in FastAPI?
How to make Depends optional in FastAPI?
Streaming multiple videos through FastAPI to Web Browser causes HTTP requests to stall
FastAPI error when handling file together with form-data defined in a Pydantic model
Use serial port in Python without installing external packages
How can i get the output to print 10 circles instead of 9 in python turtle module
Unable to install TA-Lib on Ubuntu
Unable to install Python without sudo access
Size of an open file object
two if or more with one ELSE