Search code examples
awkgawklogparser

awk: create list of destination ports seen for each source IP from a bro log (conn.log)


I'm trying to solve a problem in awk as an exercise but I'm having trouble. I want awk (or gawk) to be able to print all unique destination ports for a particular source IP address.

The source IP address is field 1 ($1) and the destination port is field 4 ($4).

Cut for brevity:
SourceIP          SrcPort   DstIP           DstPort
192.168.1.195       59508   98.129.121.199  80
192.168.1.87        64802   192.168.1.2     53
10.1.1.1            41170   199.253.249.63  53
10.1.1.1            62281   204.14.233.9    443

I imagine you would store each Source IP as in index to an array. But I'm not quite sure how you would store destination ports as values. Maybe you can keep appending to a string, being the value of the index e.g. "80,"..."80,443,"... for each match. But maybe that's not the best solution.

I'm not too concerned about output, I really just want to see how one can approach this in awk. Though, for output I was thinking something like,

Source IP:dstport, dstport, dstport
192.168.1.195:80,443,8088,5900

I'm tinkering with something like this,

awk '{ if ( NR == 1) next; arr[$1,$4] = $4 } END { for (i in arr) print arr[i] }' infile

but cannot figure out how to print out the elements and their values for a two-dimensional array. It seems something along this line would take care of the unique destination port task because each port is overwriting the value of the element.

Note: awk/gawk solution will get the answer!

Solution EDIT: slightly modified Kent's solution to print unique destination ports as mentioned in my question and to skip the column header line.

awk '{ if ( NR == 1 ) next ; if ( a[$1] && a[$1] !~ $4 ) a[$1] = a[$1]","$4; else a[$1] = $4 } END {for(x in a)print x":"a[x]}'

Solution

  • here is one way with awk:

     awk '{k=$1;a[k]=a[k]?a[k]","$4:$4}END{for(x in a)print x":"a[x]}' file
    

    with your example, the output is:

    kent$  awk '{k=$1;a[k]=a[k]?a[k]","$4:$4}END{for(x in a)print x":"a[x]}' file                                                                                               
    192.168.1.195:80
    192.168.1.87:53
    10.1.1.1:53,443
    

    (I omitted the title line)

    EDIT

    k=$1;a[k]=a[k]?a[k]","$4:$4
    

    is exactly same as:

    if (a[$1])                   # if a[$1] is not empty
        a[$1] = a[$1]","$4       # concatenate $4 to it separated by ","
    else                         # else if a[$1] is empty
        a[$1] = $4               # let a[$1]=$4
    

    I used k=$1 just for saving some typing. also the x=boolean?a:b expression

    I hope the explanation could let you understand the codes.