Filter RDD of key/value pairs based on value equality in PySpark

Given

[('Project', 10),
 ("Alice's", 11),
 ('in', 401),
 ('Wonderland,', 3),
 ('Lewis', 10),
 ('Carroll', 4),
 ('', 2238),
 ('is', 10),
 ('use', 24),
 ('of', 596),
 ('anyone', 4),
 ('anywhere', 3),

in which the value of the paired RDD is the word frequency.

I would only like to return the words that appear 10 times. Expected output

 [('Project', 10),
   ('Lewis', 10),
   ('is', 10)]

I tried using

rdd.filter(lambda words: (words,10)).collect()

But it still shows the entire list. How should I go about this?

Solution

Your lambda function is wrong; It should be

rdd.filter(lambda words: words[1] == 10).collect()

For example,

my_rdd = sc.parallelize([('Project', 10), ("Alice's", 11), ('in', 401), ('Wonderland,', 3), ('Lewis', 10)], ('is', 10)]

>>> my_rdd.filter(lambda w: w[1] == 10).collect()
[('Project', 10), ('Lewis', 10), ('is', 10)]

Python method chaining in functional programming style
flask-jwt-extended: Fake Authorization Header during testing (pytest)
For loop through the list unless empty?
Polars make all groups the same size
Is there a way to specify a default base-template for all templates in django?
How to tackle time limit exceeded error in leetcode
Is pd.get_dummies() updated in newer versions of Pandas making it default to Booleans (True/False) instead of (0/1)?
What's the function like sum() but for multiplication? product()?
How to type hint a dynamically-created dataclass
Issue with pulling the data with EIA API with Python
403 Forbidden Error when scraping a site, user-agents already used and updated. Any ideas?
Fullstack web-hosting services
How to handle an AnalysisException on Spark SQL?
Python requests is slow and takes very long to complete HTTP or HTTPS request
Is there a way to modify an element in a Numpy array based on the value of other elements?
Tkinter grid manager height/width nonconsistent
Sql Alchemy Insert Statement failing to insert, but no error
How can I create a Polars struct while eval-ing a list?
Excel using win32com and python
django: on pypy, psyco, unladen swallow or cpython, which one is the fastest?
How convert a list into multiple columns and a dataframe?
Name not defined in type annotation
Static type checkers and Language Servers not recognizing attributes of objects that are subclasses
How do I get multiple OID values in PySNMP?
how to log a file in Django
Is there a simple and efficient way to evaluate Elementary Symmetric Polynomials in Python?
Iterating over two lists one after another
Python -i flag for production
What is PyCompilerFlags in Python C API?
How to make Python check whether a variable is a number or letter