Search code examples
bashshellawktext-processinggawk

Min-Max Normalization using AWK


I dont know Why I am unable to loop through all the records. currently it goes for last record and prints the normalization for it.

Normalization formula:

New_Value = (value - min[i]) / (max[i] - min[i])

Program

{
    for(i = 1; i <= NF; i++)
    {
        if (min[i]==""){  min[i]=$i;}     #initialise min
        if (max[i]==""){  max[i]=$i;}     #initialise max
        if ($i<min[i]) {  min[i]=$i;}     #new min
        if ($i>max[i]) {  max[i]=$i;}     #new max
    }

}
END {
    for(j = 1; j <= NF; j++)
        {
        normalized_value[j] = ($j - min[j])/(max[j] - min[j]);
        print $j, normalized_value[j];
    }
}

Dataset

4 14 24 34
3 13 23 33 
1 11 21 31
2 12 22 32
5 15 25 35

Current Output

5 1
15 1
25 1
35 1

Required Output

0.75 0.75 0.75 0.75
0.50 0.50 0.50 0.50 
0.00 0.00 0.00 0.00
0.25 0.25 0.25 0.25
1.00 1.00 1.00 1.00

Solution

  • I would process the file twice, once to determine the minima/maxima, once to calculate the normalized values:

    awk '
        NR==1 {
            for (i=1; i<=NF; i++) {
                min[i]=$i
                max[i]=$i
            }
            next
        }
        NR==FNR {
            for (i=1; i<=NF; i++) {
                if      ($i < min[i]) {min[i]=$i}
                else if ($i > max[i]) {max[i]=$i}
            }
            next
        }
        {
            for (i=1; i<=NF; i++) printf "%.2f%s", ($i-min[i])/(max[i]-min[i]), FS
            print ""
        }
    ' file file
    # ^^^^ ^^^^  same file twice!
    

    outputs

    0.75 0.75 0.75 0.75 
    0.50 0.50 0.50 0.50 
    0.00 0.00 0.00 0.00 
    0.25 0.25 0.25 0.25 
    1.00 1.00 1.00 1.00