Spark Custom Partitioning in java

I want to write a custom partitioner in spark and I'm working on java.

However I've noticed that the javaRDD class (or Dataset class) doesn't have a partitionBy(Partitioner) method like in scala. Only the javaPairRDD does. How am I supposed to partition RDDs or Datasets without this method ?

Solution

How am i supposed to partition RDDs or Datasets without this method?

You suppose to not:

Datasets have no public concept of Partitioner. Instead you use repartition method which takes number of partitions and optional list of Columns. Partitioning method itself is not configurable (it is using hash partitioning with Murmur Hash).
RDDs, other than "PairRDDs" (JavaPairRDD in Java, RDD[(_, _)] in Scala) cannot be repartitioned at all. If you want to re-partition other RDD you have to convert it to PairRDD first. If you don't have a good choice you can use null as value and the record as key.

Deserializing JSON array with JSON-B
Generate GCP auth identity-token in JAVA
Get exact file size via smbj
Disable HTML Warnings in Eclipse for Java EE developers edition
Image between fragments in a viewpager while swapping
Linear search in a sorted array - Java
Create threads in java to run in background
Spring Boot adds 'es' to the links
Why is the wallet address generated by ton4j library using 24 words different from the real address of the wallet created using ton blockchain apps?
Impact of high CPU usage on a background thread execution in Java
Map implementation with duplicate keys
Java can't find symbol
Java application not recording from microphone when packaged using install4j
Using kafka-schema-registry-maven-plugin with parent and child schemas
How to add a "driver" to javax.comm? Serial port programming in Java
Java 6: Unsupported @SuppressWarnings("rawtypes") warning
java.lang.ClassCastException: java.util.LinkedHashMap cannot be cast to com.testing.models.Account
Connecting 2 java files in visual studio
Unnecessary warnings for Freemarker classes on native image execution
How do I increase the default timeout in the Cassandra Java driver using the DriverConfigLoader?
NetBeans IDE Java 1.4 compatibility: compiler not warning on JDK 5+ classes/methods
what is the need for importing libraries multiple times
Converting number to word
Spring Boot - Return an empty array of objects
Do we have XSDs particularly for Spring 5.x?
Spring + Ehcache : How to cache find all result
Java 17+ Native-Image Logback
java -jar x.jar in Windows PowerShell, non-ASCII characters cannot be displayed well
Kafka Best Practices + how to set recommended setting for JVM
Encode a codepoint