I'm working on a clustering program, and have a dataset of doubles that I need to normalize in order to make sure that every double (variable) has the same influence.
I would like to use min-max normalization where for every variable the min and max value are determined, but I'm not sure how I could implement this on my dataset in Java. Does anyone have any suggestions?
The Encog Project wiki gives a utility class that does range normalization.
The constructor takes the high and low values for input and normalized data.
/**
* Construct the normalization utility, allow the normalization range to be specified.
* @param dataHigh The high value for the input data.
* @param dataLow The low value for the input data.
* @param dataHigh The high value for the normalized data.
* @param dataLow The low value for the normalized data.
*/
public NormUtil(double dataHigh, double dataLow, double normalizedHigh, double normalizedLow) {
this.dataHigh = dataHigh;
this.dataLow = dataLow;
this.normalizedHigh = normalizedHigh;
this.normalizedLow = normalizedLow;
You can then use the normalize
method on a sample.
/**
* Normalize x.
* @param x The value to be normalized.
* @return The result of the normalization.
*/
public double normalize(double x) {
return ((x - dataLow)
/ (dataHigh - dataLow))
* (normalizedHigh - normalizedLow) + normalizedLow;
}
To find the minimum and the maximum of your dataset, use one answer of this question : Finding the max/min value in an array of primitives using Java.