Search code examples
javadoublecluster-computingnormalization

Normalization of a dataset in Java


I'm working on a clustering program, and have a dataset of doubles that I need to normalize in order to make sure that every double (variable) has the same influence.

I would like to use min-max normalization where for every variable the min and max value are determined, but I'm not sure how I could implement this on my dataset in Java. Does anyone have any suggestions?


Solution

  • The Encog Project wiki gives a utility class that does range normalization.

    The constructor takes the high and low values for input and normalized data.

    /**
         * Construct the normalization utility, allow the normalization range to be specified.
         * @param dataHigh The high value for the input data.
         * @param dataLow The low value for the input data.
         * @param dataHigh The high value for the normalized data.
         * @param dataLow The low value for the normalized data. 
         */
        public NormUtil(double dataHigh, double dataLow, double normalizedHigh, double normalizedLow) {
            this.dataHigh = dataHigh;
            this.dataLow = dataLow;
            this.normalizedHigh = normalizedHigh;
            this.normalizedLow = normalizedLow;
    

    You can then use the normalize method on a sample.

    /**
     * Normalize x.
     * @param x The value to be normalized.
     * @return The result of the normalization.
     */
    public double normalize(double x) {
        return ((x - dataLow) 
                / (dataHigh - dataLow))
                * (normalizedHigh - normalizedLow) + normalizedLow;
    }
    

    To find the minimum and the maximum of your dataset, use one answer of this question : Finding the max/min value in an array of primitives using Java.