There is a filter in WEKA in the preprocess tab named Randomized. The definition of this filter is Randomly shuffles the order of instances passed through it. The filter has a parameter called randomseed which is by default set as 42.
I found some definition of randomseed sunch as A random seed (or seed state, or just seed) is a number (or vector) used to initialize a pseudorandom number generator.
Seed function is used to save the state of a random function, so that it can generate same random numbers on multiple executions of the code on the same machine or on different machines (for a specific seed value). The seed value is the previous value number generated by the generator.
The number "42" was apparently chosen as a tribute to the "Hitch-hiker's Guide" books by Douglas Adams, as it was supposedly the answer to the great question of "Life, the universe, and everything" as calculated by a computer (named "Deep Thought") created specifically to solve it.
All those answers on the internet made me more confused.
I cannot understand what randomseed will do with random shuffle? Thus this means, 42 instances will take from the beginning of the instances and shuffle. Then again 42 instances will be taken and shuffled and the process will continue till the end?
The seed
value simply initializes the java.util.Random object that generates the sequence of pseudo random numbers used for randomization.
A different seed value will result in a different sequence, therefore resulting in a different randomization of rows in your dataset.
The Randomize filter initializes such a Random
object and then calls the randomize
method of the weka.core.Instances object it currently holds. The code of the randomize
method (at time of writing this is Weka 3.9.6) looks like this:
public void randomize(Random random) {
for (int j = numInstances() - 1; j > 0; j--) {
swap(j, random.nextInt(j + 1));
}
}
All options of option-handling classes in Weka have a default value. In case of the Randomize
filter, the seed value has the default value of 42. It could have been anything, but someone was a fan of the Hitchhiker's Guide to the Galaxy and chose that value.