Search code examples
bashparsingawktext-parsing

Bash + awk advanced parsing


I want to create a oneliner using awk with some logic. this is the command and the output to work with:

grep -v "local_address" --no-filename "/proc/net/tcp" "/proc/net/tcp6"
   0: 00000000:0016 00000000:0000 0A 00000000:00000000 00:00000000 00000000     0        0 7714 1 000000009796c30c 100 0 0 10 0
   1: 845DA8C0:1F91 00000000:0000 0A 00000000:00000000 00:00000000 00000000     0        0 66072 1 0000000080e84053 100 0 0 10 0
   2: 0100007F:86FD 00000000:0000 0A 00000000:00000000 00:00000000 00000000     0        0 476 1 000000005f80511b 100 0 0 10 0
   3: 00000000:1F90 00000000:0000 0A 00000000:00000000 00:00000000 00000000     0        0 29391 1 0000000098b1d0d1 100 0 0 10 0
   4: 00000000:1F92 00000000:0000 0A 00000000:00000000 00:00000000 00000000     0        0 66342 1 0000000004ea4afe 100 0 0 10 0
   5: 9F01A8C0:0016 0101A8C0:E488 01 00000000:00000000 02:000AD0D3 00000000     0        0 68121 2 000000008fdfa15d 20 4 16 10 -1
   0: 00000000000000000000000000000000:0016 00000000000000000000000000000000:0000 0A 00000000:00000000 00:00000000 00000000     0        0 7716 1 0000000037928e79 100 0 0 10 0

I'm creating a function to detect possible port conflicts and I already have functions for ip and port conversions from hex to dec and viceversa. In this case for example, IP 192.168.93.132 is 845DA8C0. So if I ask for possible port conflicts on this interface, the querie should return all the ports pertaining to this ip (845DA8C0) and all pertaining to 00000000 which is on decimal 0.0.0.0 and are also affecting and conflicting. So in this case the correct answer should be 0016, 1F91, 1F90 and 1F92 (not relevant but in dec are ports 22, 8081, 8080 and 8082). And I also want the $4 column which contains the 0A or whatever value because it is used to determine if the port is used. So the expected and desired output should be:

00160A
1F910A
1F900A
1F920A

My current oneliner approach is adding to the array all the ports, no matter what interface are they on:

declare -a busy_ports=($(grep -v "local_address" --no-filename "/proc/net/tcp" "/proc/net/tcp6" | awk '{print $2$4}' | cut -d ":" -f 2 | sort -u))

So doing this wrong or incomplete approach, this is the output:

001601
00160A
1F900A
1F910A
1F920A
86FD0A

I'd like to add at awk level a layer of logic to match only if the line is having the given ip address (in this case 845DA8C0 or 00000000). Can anybody help on this?


Solution

  • To simulate OP's grep -v output:

    $ cat grep.out
       0: 00000000:0016 00000000:0000 0A 00000000:00000000 00:00000000 00000000     0        0 7714 1 000000009796c30c 100 0 0 10 0
       1: 845DA8C0:1F91 00000000:0000 0A 00000000:00000000 00:00000000 00000000     0        0 66072 1 0000000080e84053 100 0 0 10 0
       2: 0100007F:86FD 00000000:0000 0A 00000000:00000000 00:00000000 00000000     0        0 476 1 000000005f80511b 100 0 0 10 0
       3: 00000000:1F90 00000000:0000 0A 00000000:00000000 00:00000000 00000000     0        0 29391 1 0000000098b1d0d1 100 0 0 10 0
       4: 00000000:1F92 00000000:0000 0A 00000000:00000000 00:00000000 00000000     0        0 66342 1 0000000004ea4afe 100 0 0 10 0
       5: 9F01A8C0:0016 0101A8C0:E488 01 00000000:00000000 02:000AD0D3 00000000     0        0 68121 2 000000008fdfa15d 20 4 16 10 -1
       0: 00000000000000000000000000000000:0016 00000000000000000000000000000000:0000 0A 00000000:00000000 00:00000000 00000000     0        0 7716 1 0000000037928e79 100 0 0 10 0
    

    One awk approach:

    cat grep.out | awk -v iplist='845DA8C0,00000000' '    # set awk variable "iplist" to set of ip addresses delimited by comma
    
    BEGIN { split(iplist,a,",")                           # split iplist into array a[]
            for (i in a) ips[a[i]]                        # convert a[1]=845DA8C0 to ips[845DA8C0]
          }
          { split($2,a,":")                               # split 2nd field on ":"
            if (a[1] in ips)                              # if 1st tuple is an index in ips[] array then ...
               ports[a[2] $4]                             # save 2nd tuple and 4th field as index in array ports[]; this will eliminate duplicates
          }
    END   { for (port in ports)                           # loop through indices of port[] array and ...
                print port                                # print the port info to stdout
          }
    ' | sort                                              # output order from this awk script is not guaranteed so let "sort" provide the desired order
    

    This generates:

    00160A
    1F910A
    1F900A
    1F920A
    

    As has been pointed out in comments, once you bring awk into the mix you can usually eliminate the need for grep, cut and using sort (to filter out duplicates).

    Modifying the awk script to directly process the /proc/net files:

    awk -v iplist='845DA8C0,00000000' '
    
    BEGIN           { split(iplist,a,",")
                      for (i in a) ips[a[i]]
                    }
    /local_address/ { next }                         # skip processing lines that contain the string "local_address"
                    { split($2,a,":")
                      if (a[1] in ips)
                         ports[a[2] $4]
                    }
    END             { for (port in ports)
                          print port
                    }
    ' /proc/net/tcp /proc/net/tcp6 | sort
    

    Removing comments and making harder to read (ie, turn into a one-liner):

    awk -v iplist='845DA8C0,00000000' 'BEGIN { split(iplist,a,","); for (i in a) ips[a[i]] } /local_address/ { next } { split($2,a,":"); if (a[1] in ips) ports[a[2] $4] } END { for (port in ports) print port }' /proc/net/tcp /proc/net/tcp6
    

    Improving readability by squeezing out all unnecessary white space:

    awk -v iplist='845DA8C0,00000000' 'BEGIN{split(iplist,a,",");for(i in a)ips[a[i]]}/local_address/{next}{split($2,a,":");if(a[1] in ips)ports[a[2]$4]}END{for(port in ports)print port}' /proc/net/tcp /proc/net/tcp6