RDD pyspark partitionBy - TypeError: 'int' object is not subscriptable

list_1 = [[6, [3, 8, 7]], [5, [9, 7, 3]], [6, [7, 8, 5]], [5, [6, 7, 2]]]

rdd1 = sc.parallelize(list_1)
newpairRDD = rdd1.partitionBy(2,lambda k: int(k[0]))
print("Partitions structure: {}".format(newpairRDD.glom().collect()))

I want to partition by key.

I am getting

TypeError: 'int' object is not subscriptable

What am I doing wrong?

Solution

The partitioning function provided to partitionBy operates on the key of each entry of the RDD, i.e. the first element of each entry. So you're calling lambda k: int(k[0]) on the integer keys, thus causing the error you encountered.

If you simply want to partition by key, your lambda function should be an identity operation, e.g.

newpairRDD = rdd1.partitionBy(2, lambda x: x)

How to read the json file in spark using scala?
Unable to start thrift-server due to class org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter is not a javax.servlet.Filter
filter only not empty arrays dataframe spark
Escape a single quote in plain Databricks SQL
How can I get all names of the arrays on Dataframe
create a Spark DataFrame from a nested array of struct element?
Executing multiple SQL queries on Spark - Table or view not found
spark UI - Understand metrics memory used
Spark: Trying to run spark-shell, but get 'cmd' is not recognized as an internal or
Remove list elements in a dataframe in scala
Not able to Explode and select in the same expression in spark scala
Fetching data from REST API to Spark Dataframe using Pyspark
Count entries for all possible categories
Create column using Spark pandas_udf, with dynamic number of input columns
How to find position of substring column in another column using PySpark?
How to correctly read a CSV file while escaping delimiter comma placed within square brackets using Apache Spark and Scala?
SPARK SQL Equivalent of Qualify + Row_number statements
How to drop a column from a Databricks Delta table?
Converting all columns in spark df from decimal to float for pandas conversion
How to create a copy of a dataframe in pyspark?
Read previous Spark APIs
Unexpected output from least (source data includes nulls)
How to use PySpark UDF in Java / Scala Spark project
How does spark load python package depends on the external library?
Disable PySpark to print info when running
PySpark: How To Deserialise A Proto Payload From A Kafka Message With Variable Message Type
Multiple Sinks Processing not persisting in Databricks Community Edition
How to find longest sequence of consecutive dates?
graph.triplets seems not work as expected
PySpark MongoDB :: java.lang.NoClassDefFoundError: com/mongodb/client/model/Collation