java apache-spark k-means apache-spark-mllib

Spark KMeans produces deterministic results and not random

I am running Spark KMeans and I would like to have random seeds in every run for different results every time, however this is not the case. This is the code that I am using:

KMeans kmeans = new KMeans().setK(4).setInitMode("random");
KMeansModel model = kmeans.fit(ds);
Dataset<Row> predictions = model.transform(ds);

I always get the same score and the same clusters. Am I missing something in the code?

Solution

I think you're missing the random seed:

// Set the random seed
long seed = System.currentTimeMillis();

// Create the KMeans instance and set the random seed
KMeans kmeans = new KMeans().setK(4).setInitMode("random").setSeed(seed);
KMeansModel model = kmeans.fit(ds);
Dataset<Row> predictions = model.transform(ds);

How do I find where JDK is installed on my windows machine?
CORS in Spring Security (Spring Boot 3)
Illegal base64url character: ' ' when getting claims/decode from token Java JWT Spring Boot
Java example of using ExecutorService and PipedReader/PipedWriter (or PipedInputStream/PipedOutputStream) for consumer-producer
Why Joda DateTime gives different result than Java Date?
Why am I getting "time limit exceeded" error in binary tree level order traversal problem in leetcode?
do while loop to for each or for loop
Switch Between 3 Tabs Using Selenium WebDriver with Java
Lombok added but getters and setters not recognized in Intellij IDEA
How do I avoid checking for nulls in Java?
How to get all week dates for given date java
IntelliJ IDEA stucks at "collecting data" while debug
Disconnected from the target VM, address: '127.0.0.1:51928', transport: 'socket'
UnsupportedOperationException when upgrading from Java 17 to Java 21
Is there a way to get user's UID on Linux machine using java?
Environment variable to control java.io.tmpdir?
Why does my file have race conditions, even though I used StandardOpenOption.SYNC?
JRuby-1.7.19 UDPSocket "initialize: name or service not known" while scanning IP range
how to debug spring application with gradle
Could not resolve all files for configuration ':app:androidJdkImage
I do not understand why the value of smallCountLoopCount changes from 0 to 1 in the code provided. I expect it to remain at 0
How can I emulate non-blocking i/o in Java using threads
H2 in Tomcat SqlException locked by another process when embedded mode
checking java date with postgresql timestamp without timezone
Read in different data types from file
Java, sorting analysis. Heapsort, Quicksort1, Quicksort2, Mergesort, given a blackbox
How to get the YEAR for week of year for a date?
What's the difference between Instant and LocalDateTime?
Problems when a non-generic method overrides a generic method
Binary search for first occurrence of k