How can I group with multiple RDDs without aggregate the original RDD's partition?

I have two RDDs have common variables, which have format like:

 x = sc.parallelize([("A", 1), ("B", 4),("A",2)])
 y = sc.parallelize([("A", -1),("B", 5)])

Then I want to group with them using the common variable. "A" and "B".

I have tried to use the command below:

 z = [(x, tuple(map(list, y))) for x, y in sorted(list(x.cogroup(y).collect()))]
 print(z)

What I got is

[('A', ([1, 2], [-1])), ('B', ([4], [5]))]

However, the thing I want is

[('A', ([1], [-1])), ('B', ([4], [5])),('A', ([2], [-1]))]

How can I change the code to get the output like above? Thank you.

Solution

You can do this with a straight join:

print(x.join(y).collect())
#[('A', (1, -1)), ('A', (2, -1)), ('B', (4, 5))]

Add in a call to mapValues if you want the elements of the tuples to be lists:

print(x.join(y).mapValues(lambda a: tuple([b] for b in a)).collect())
#[('A', ([1], [-1])), ('A', ([2], [-1])), ('B', ([4], [5]))]

Python method chaining in functional programming style
flask-jwt-extended: Fake Authorization Header during testing (pytest)
For loop through the list unless empty?
Polars make all groups the same size
Is there a way to specify a default base-template for all templates in django?
How to tackle time limit exceeded error in leetcode
Is pd.get_dummies() updated in newer versions of Pandas making it default to Booleans (True/False) instead of (0/1)?
What's the function like sum() but for multiplication? product()?
How to type hint a dynamically-created dataclass
Issue with pulling the data with EIA API with Python
403 Forbidden Error when scraping a site, user-agents already used and updated. Any ideas?
Fullstack web-hosting services
How to handle an AnalysisException on Spark SQL?
Python requests is slow and takes very long to complete HTTP or HTTPS request
Is there a way to modify an element in a Numpy array based on the value of other elements?
Tkinter grid manager height/width nonconsistent
Sql Alchemy Insert Statement failing to insert, but no error
How can I create a Polars struct while eval-ing a list?
Excel using win32com and python
django: on pypy, psyco, unladen swallow or cpython, which one is the fastest?
How convert a list into multiple columns and a dataframe?
Name not defined in type annotation
Static type checkers and Language Servers not recognizing attributes of objects that are subclasses
How do I get multiple OID values in PySNMP?
how to log a file in Django
Is there a simple and efficient way to evaluate Elementary Symmetric Polynomials in Python?
Iterating over two lists one after another
Python -i flag for production
What is PyCompilerFlags in Python C API?
How to make Python check whether a variable is a number or letter