I have the following input file:
Unit1 15 00:20:58
Unit1 30 01:10:00
Unit3 10 00:20:15
Unit2 5 00:45:00
Unit3 20 00:30:00
Unit2 2 01:22:35
Unit2 3 01:35:22
Unit1 5 00:58:20
For some background on this input file. It is a list of work Units for an e-portal that I have been tasked with analyzing. In the log file it provides the Unit name ($1
) as well as the total number of questions that a student has completed ($2
) before hitting submit which records the time ($3
),tweaked to allow for a clearer example.
I would like to output the following:
Unit1
---------------------
00
========
20
--------
01
========
30
--------
Unit2
---------------------
00
========
5
--------
01
========
5
--------
Unit3
---------------------
00
========
30
--------
the Code I have currently is as follows:
#!/usr/bin/gawk -f
{ #Start of MID
key = $1 #Message Extracted 10 Total
key2 = substr($3,1,2) #Hour
MSG_TYPE[key]++ #Distinct Message
HOUR_AR[key2]++
HT_AR[key2] += $2 #Tots up the total for each message by hour
} #End of MID
END {
for (MSG in MSG_TYPE) {
print MSG
print "-----------------------------------"
n=asorti(HOUR_AR, HOUR_SOR)
for (i = 1; i <= n; i++) {
print HOUR_SOR[i]
print "========="
print HOUR_AR[HOUR_SOR[i]]
print "---------"
}
print "\n"
}
} #End of END
The logic behind this code is that it get's all the unique values from $1
with the MSG_TYPE[]
. This is then scanned in a for
loop and prints out each value. The hour is collected by the HOUR_AR[]
array and it sorted and then for each pass of the MSG
for
loop returns,hopefully, all the hours for that particular MSG
and then it prints a sum of $2
for that hour AND MSG
.
I am sorry this is long winded. Just wanted to provide enough detail. Any and all help is greatly appreciated.
for the given example, this codes gave output as you expected:
awk -F'[ :]+' '{u[$1][$3]+=$2}
END{for(i in u){
print i;print "--------";
for(j in u[i])
print j"\n====\n"u[i][j]"\n---"}}' file
it outputs:
Unit1
--------
00
====
20
---
01
====
30
---
Unit2
--------
00
====
5
---
01
====
5
---
Unit3
--------
00
====
30
---
Note the sorting part is not done in codes. But you got the idea, you can make the implementation easier if you used gnu awk's array of array.
https://www.gnu.org/software/gawk/manual/html_node/Arrays-of-Arrays.html#Arrays-of-Arrays