select elements from rdd where for (x,y), (y,x) is present in the rdd

I have the following rdd

[('K', ' M'),
 ('K', ' H'),
 ('M', ' K'),
 ('M', ' E'),
 ('H', ' F'),
 ('B', ' T'),
 ('B', ' H'),
 ('E', ' K'),
 ('E', ' H'),
 ('F', ' K'),
 ('F', ' H'),
 ('F', ' E'),
 ('A', ' Z')]

I want to filter out the elements (x,y) for which (y,x) is present in the rdd. In my example the output should be like:

[(K,M),
 (H,F)]

Thanks for help

Solution

You can put each tuple in order, count the tuples and then filter out tuples that have appeared more than once:

rdd.groupBy(lambda t: (min(t), max(t)))
   .mapValues(len)
   .filter(lambda t: t[1] > 1)
   .map(lambda t: t[0])
   .collect()

# [('F', 'H'), ('K', 'M')]

Advanced Filtering Operations in PySpark
pyspark statistical window function keeps calculating NULL values
Pyspark JDBC read with partitions
How to throw Exception in Databricks?
Are Spark checkpoints invalidated when source data is changed?
why do I have a label Problem when using Crossvalidator
ALS (Alternating Least Square) algorithm in multiple rankings for a user
Fetching data from REST API to Spark Dataframe using Pyspark
Count entries for all possible categories
Create column using Spark pandas_udf, with dynamic number of input columns
Could not instantiate EventHubSourceProvider for Azure Databricks
How to find position of substring column in another column using PySpark?
multiple aggregations on same column using agg in pyspark
How to create a copy of a dataframe in pyspark?
Read previous Spark APIs
PySpark filtering
Why is my PySpark DataFrame not displaying properly in a table format?
Unexpected output from least (source data includes nulls)
How to use unboundedPreceding, unboundedFollowing and currentRow in rowsBetween in PySpark
How to use LIMIT ALL with DataFrame
How to use PySpark UDF in Java / Scala Spark project
How does spark load python package depends on the external library?
Disable PySpark to print info when running
pySpark Hadoop AWS s3 requester-pays.enabled config doesn't work
PySpark: How To Deserialise A Proto Payload From A Kafka Message With Variable Message Type
Multiple Sinks Processing not persisting in Databricks Community Edition
How to concantenate elements of a binary column?
PySpark MongoDB :: java.lang.NoClassDefFoundError: com/mongodb/client/model/Collation
How do I access the fields within a VARIANT column while reading from Kafka using Spark?
Pyspark creating paring logic