Search code examples
awksumcomputed-field

count number of occurences and compute values in two files using awk


I have 2 files, file1 has 1M rows approx, and file2 around 1k rows.

file1 contains some fields (tower_id, user_id, signal_strength) and looks like this:

"0001","00abcde","0.65"
"0002","00abcde","0.35"
"0005","00bcdef","1.0"
"0001","00cdefg","0.1"
"0003","00cdefg","0.4"
"0008","00cdefg","0.3"
"0009","00cdefg","0.2"

file2 contains other fields (tower_id, x_position, y_position), and looks like this:

"0001","34","22"
"0002","78","56"
"0003","12","32"
"0004","79","45"
"0005","36","37"
"0006","87","99"
"0007","27","93"
"0008","55","04"
"0009","02","03"

The sum of signal_strength for each user_id is 1. I need to compute the user position, based on the signal strength for each tower, by counting the number of towers for each user, and compute the multiplication of the strength_signal by the tower_position values, like this:

"00abcde" --> 0.65*34+0.35*78, 0.65*22+0.35*56
"00bcdef" --> 1.0*36, 1.0*37
"00cdefg" --> 0.1*34+0.4*12+0.3*55+0.2*02, 0.1*22+0.4*32+0.3*04+0.2*03

So the output file should look something like this (user_id, computed_x_position, computed_y_position):

00abcde,49.4,33.9
00bcdef,36,37
00cdefg,25.1,16.8

My idea was to use awk, somehow use the "seen" feature and file1 and file2 as input files (like awk 'NR==FNR {some commands} {print some values}' file1 file2 > outputfile ), but I don't know how to do it. Anyone can help me?


Solution

  • This may be what you want:

    awk -F '[,"]+' '
        NR==FNR { towx[$2] = $3; towy[$2] = $4; next }
                { usrx[$3] += towx[$2] * $4; usry[$3] += towy[$2] * $4 }
        END     { for (usr in usrx) printf "%s,%.1f,%.1f\n",
                                           usr, usrx[usr], usry[usr] }
    ' file2 file1 # file2 precedes file1