java double cluster-computing normalization

Normalization of a dataset in Java

I'm working on a clustering program, and have a dataset of doubles that I need to normalize in order to make sure that every double (variable) has the same influence.

I would like to use min-max normalization where for every variable the min and max value are determined, but I'm not sure how I could implement this on my dataset in Java. Does anyone have any suggestions?

Solution

The Encog Project wiki gives a utility class that does range normalization.

The constructor takes the high and low values for input and normalized data.

/**
     * Construct the normalization utility, allow the normalization range to be specified.
     * @param dataHigh The high value for the input data.
     * @param dataLow The low value for the input data.
     * @param dataHigh The high value for the normalized data.
     * @param dataLow The low value for the normalized data. 
     */
    public NormUtil(double dataHigh, double dataLow, double normalizedHigh, double normalizedLow) {
        this.dataHigh = dataHigh;
        this.dataLow = dataLow;
        this.normalizedHigh = normalizedHigh;
        this.normalizedLow = normalizedLow;

You can then use the normalize method on a sample.

/**
 * Normalize x.
 * @param x The value to be normalized.
 * @return The result of the normalization.
 */
public double normalize(double x) {
    return ((x - dataLow) 
            / (dataHigh - dataLow))
            * (normalizedHigh - normalizedLow) + normalizedLow;
}

To find the minimum and the maximum of your dataset, use one answer of this question : Finding the max/min value in an array of primitives using Java.