How to merge two awk results with a twist?

I’m trying to graph some trends, so I wanted to “simplify” the manual data gathering process I do on a spreadsheet and get it running easily.

That is, I got the results from this, with headers Timestamp,Core #,100 - %Idle:

# sadf -d -- -P ALL -s 00:00:00 -e 01:30:00 | awk '{FS=";"} {if ($4 != -1) {print $3","$4","100 - $NF}}' | sed '1d' &>cpuresults.txt

2015-08-11 05:10:04 UTC,0,22.87
2015-08-11 05:10:04 UTC,1,15.99
2015-08-11 05:10:04 UTC,2,9.81
2015-08-11 05:10:04 UTC,3,7.95
2015-08-11 05:20:03 UTC,0,19.38
2015-08-11 05:20:03 UTC,1,15.63
2015-08-11 05:20:03 UTC,2,9.3
2015-08-11 05:20:03 UTC,3,6.59
2015-08-11 05:30:06 UTC,0,18.42
2015-08-11 05:30:06 UTC,1,16.7
2015-08-11 05:30:06 UTC,2,9.06
2015-08-11 05:30:06 UTC,3,6.22
2015-08-11 05:40:03 UTC,0,19.93
2015-08-11 05:40:03 UTC,1,16.27
2015-08-11 05:40:03 UTC,2,9.41
2015-08-11 05:40:03 UTC,3,7.8
2015-08-11 05:50:03 UTC,0,20.84
2015-08-11 05:50:03 UTC,1,15.86
2015-08-11 05:50:03 UTC,2,9.16
2015-08-11 05:50:03 UTC,3,8.33
2015-08-11 06:00:07 UTC,0,18.68
2015-08-11 06:00:07 UTC,1,16.93
2015-08-11 06:00:07 UTC,2,9.58
2015-08-11 06:00:07 UTC,3,6.45
2015-08-11 06:10:01 UTC,0,21.76
2015-08-11 06:10:01 UTC,1,15.4
2015-08-11 06:10:01 UTC,2,10.48
2015-08-11 06:10:01 UTC,3,8.65
2015-08-11 06:20:01 UTC,0,16.01
2015-08-11 06:20:01 UTC,1,14.35
2015-08-11 06:20:01 UTC,2,8.12
2015-08-11 06:20:01 UTC,3,5.89

And this, with headers Timestamp,Memory Used:

# sadf -dh -- -r -s 00:00:00 -e 01:30:00 | awk '{FS=";"} {print $3","$6}' | sed '1d' &>memresults.txt

2015-08-11 05:10:04 UTC,52.00
2015-08-11 05:20:03 UTC,51.93
2015-08-11 05:30:06 UTC,52.07
2015-08-11 05:40:03 UTC,52.02
2015-08-11 05:50:03 UTC,51.89
2015-08-11 06:00:07 UTC,52.03
2015-08-11 06:10:01 UTC,51.98
2015-08-11 06:20:01 UTC,51.98

And I‘m trying to get an output like this:

Timestamp,Memory,Core0,Core1,Core2,Core3
2015-08-11 05:10:04 UTC,52.00,22.87,15.99,9.81,7.95
2015-08-11 05:20:03 UTC,51.93,19.38,15.63,9.3,6.59
2015-08-11 05:30:06 UTC,52.07,18.42,16.7,9.06,6.22
2015-08-11 05:40:03 UTC,52.02,19.93,16.27,9.41,7.8
2015-08-11 05:50:03 UTC,51.89,20.84,15.86,9.16,8.33
2015-08-11 06:00:07 UTC,52.03,18.68,16.93,9.58,6.45
2015-08-11 06:10:01 UTC,51.98,21.76,15.4,10.48,8.65
2015-08-11 06:20:01 UTC,51.98,16.01,14.35,8.12,5.89

So, as you see, it’s simply transposing the last row of each CPU core, but I can’t see how to do it. Besides, there might be future cores.

I tried doing a loop on awk (with a cpucount variable as limit) as to print just the results for the maximum number of cores, matching the timestamp on both files, but I didn’t get far.

Can you please help me out with the awk formula to transpose the CPU cores values and add them to memresults.txt file so it ends up as the example I posted?

Thank you!

Solution

Just store the data in an array and print it later on:

awk -F, 'FNR==NR {data[$1] = (data[$1]?data[$1] FS $3:$3); next}
         {print $0 FS data[$1]}' cpu mem

This stores all the data from the cpu file into an array data[]. data [ first column ] keeps adding the content of the 3rd column.

Then, when you are looping through the mem file it is just a matter of printing the full line followed by the correspondent data from data [ 1st column ].

$ awk -F, 'FNR==NR {a[$1] = (a[$1]?a[$1] FS $3:$3); next} {print $0 FS a[$1]}' cpu mem
2015-08-11 05:10:04 UTC,52.00,22.87,15.99,9.81,7.95
2015-08-11 05:20:03 UTC,51.93,19.38,15.63,9.3,6.59
2015-08-11 05:30:06 UTC,52.07,18.42,16.7,9.06,6.22
2015-08-11 05:40:03 UTC,52.02,19.93,16.27,9.41,7.8
2015-08-11 05:50:03 UTC,51.89,20.84,15.86,9.16,8.33
2015-08-11 06:00:07 UTC,52.03,18.68,16.93,9.58,6.45
2015-08-11 06:10:01 UTC,51.98,21.76,15.4,10.48,8.65
2015-08-11 06:20:01 UTC,51.98,16.01,14.35,8.12,5.89

To have the header Timestamp,Memory,Core0,Core1,Core2,Core3 you can for example print it in the BEGIN block.