I have 2 files, file1 has 1M rows approx, and file2 around 1k rows.
file1 contains some fields (tower_id, user_id, signal_strength) and looks like this:
"0001","00abcde","0.65"
"0002","00abcde","0.35"
"0005","00bcdef","1.0"
"0001","00cdefg","0.1"
"0003","00cdefg","0.4"
"0008","00cdefg","0.3"
"0009","00cdefg","0.2"
file2 contains other fields (tower_id, x_position, y_position), and looks like this:
"0001","34","22"
"0002","78","56"
"0003","12","32"
"0004","79","45"
"0005","36","37"
"0006","87","99"
"0007","27","93"
"0008","55","04"
"0009","02","03"
The sum of signal_strength for each user_id is 1. I need to compute the user position, based on the signal strength for each tower, by counting the number of towers for each user, and compute the multiplication of the strength_signal by the tower_position values, like this:
"00abcde" --> 0.65*34+0.35*78, 0.65*22+0.35*56
"00bcdef" --> 1.0*36, 1.0*37
"00cdefg" --> 0.1*34+0.4*12+0.3*55+0.2*02, 0.1*22+0.4*32+0.3*04+0.2*03
So the output file should look something like this (user_id, computed_x_position, computed_y_position):
00abcde,49.4,33.9
00bcdef,36,37
00cdefg,25.1,16.8
My idea was to use awk, somehow use the "seen" feature and file1 and file2 as input files (like awk 'NR==FNR {some commands} {print some values}' file1 file2 > outputfile
), but I don't know how to do it. Anyone can help me?
This may be what you want:
awk -F '[,"]+' '
NR==FNR { towx[$2] = $3; towy[$2] = $4; next }
{ usrx[$3] += towx[$2] * $4; usry[$3] += towy[$2] * $4 }
END { for (usr in usrx) printf "%s,%.1f,%.1f\n",
usr, usrx[usr], usry[usr] }
' file2 file1 # file2 precedes file1