Search code examples
stringbashsortingtshark

Line manipulation & sorting


I am alright at writing Linux scripts but could use some advice. I know the problem is sort of vague, so if you can provide any help whatsoever I will appreciate it!

The following issue is for personal growth, and because I am writing some network tools for fun/learning. No homework involved (I'm a senior in college, none of my classes require this stuff!)

I am using tshark to get information about packet captures. This is what it looks like:

rachel@Ubuntu-1:~/PCAP$ tshark -r LargeTorrent.pcap -q -z io,phs

===================================================================
Protocol Hierarchy Statistics
Filter: 

eth                                      frames:4309 bytes:3984321
  ip                                     frames:4119 bytes:3969006
    icmp                                 frames:1316 bytes:1308988
    udp                                  frames:1408 bytes:1350786
      data                               frames:1368 bytes:1346228
      dns                                frames:16 bytes:1176
      nbns                               frames:14 bytes:1300
      http                               frames:8 bytes:1596
      nbdgm                              frames:2 bytes:486
        smb                              frames:2 bytes:486
          mailslot                       frames:2 bytes:486
            browser                      frames:2 bytes:486
    tcp                                  frames:1395 bytes:1309232
      data                               frames:1300 bytes:1294800
      http                               frames:6 bytes:3763
        data-text-lines                  frames:2 bytes:324
        xml                              frames:2 bytes:3205
          tcp.segments                   frames:1 bytes:787
      nbss                               frames:34 bytes:5863
        smb                              frames:17 bytes:3047
          pipe                           frames:4 bytes:686
            lanman                       frames:4 bytes:686
        smb2                             frames:13 bytes:2444
      bittorrent                         frames:10 bytes:1709
        tcp.segments                     frames:2 bytes:433
          bittorrent                     frames:2 bytes:433
            bittorrent                   frames:1 bytes:258
        bittorrent                       frames:2 bytes:221
          bittorrent                     frames:2 bytes:221
  arp                                    frames:146 bytes:8760
  ipv6                                   frames:44 bytes:6555
    udp                                  frames:40 bytes:6211
      dns                                frames:18 bytes:1711
      dhcpv6                             frames:14 bytes:2114
      http                               frames:6 bytes:1014
      data                               frames:2 bytes:1372
    icmpv6                               frames:4 bytes:344
===================================================================

What I would like for it to look like:

rachel@Ubuntu-1:~/PCAP$ tshark -r LargeTorrent.pcap -q -z io,phs

===================================================================
Protocol Hierarchy Statistics
Filter: 

Protocol                   Bytes
=====================================
eth                        984321
  ip                       3969006
    icmp                   1308988
    udp                    1350786
      data                 1346228
      dns                  1176
      nbns                 1300
      http                 1596
      nbdgm                486
        smb                486
          mailslot         486
            browser        486
    tcp                    1309232
      data                 1294800
      http                 3763
        data-text-lines    324
        xml                3205
          tcp.segments     787
      nbss                 5863
        smb                3047
          pipe             686
            lanman         686
        smb2               2444
      bittorrent           1709
        tcp.segments       433
          bittorrent       433
            bittorrent     258
        bittorrent         221
          bittorrent       221
  arp                      8760
  ipv6                     6555
    udp                    6211
      dns                  1711
      dhcpv6               2114
      http                 1014
      data                 1372
    icmpv6                 344
===================================================================



Edit: I am going to add the original question for the purpose of making sense of the (great) answer that was provided.

Originally, I wanted to only print statistics for "leaves" because eth, ip, etc. are all parents and their statistics are not necessary for my purposes. In addition, instead of having a god-awful block of text with only spaces to show hierarchy, I wanted to erase all the statistics for parents, and show them as breadcrumbs behind the child.

Example:

eth                                      frames:4309 bytes:3984321
  ip                                     frames:4119 bytes:3969006
    icmp                                 frames:1316 bytes:1308988
    udp                                  frames:1408 bytes:1350786
      data                               frames:1368 bytes:1346228
      dns                                frames:16 bytes:1176

Should become

eth:ip:icmp - 1308988 bytes
eth:ip:udp:data - 1346228 bytes
eth:ip:udp:dns - 1176 bytes

To preserve the hierarchy and avoid printing useless statistics.

Anyway, the approved answer by Etan solved this perfectly! And for those of you who are on my level who are unsure of how to proceed after this answer, this will help you finish up:

  1. Save the given script as a filename.awk file
  2. Save the block of text you want to manipulate as a filename.txt file
  3. Call awk -f filename.awk filename.txt
  4. Optionally pipe the output to a file ( awk -f filename.awk filename.txt >> output.txt )

Solution

  • The output I originally thought you wanted could be achieved with this awk script. (I think this can probably be done cleaner but this seems to work well enough.)

    function entry() {
        # Don't want to print empty entries.
        if (ind[0]) {
            printf "%s", ind[0]
            for (i = 1; i <= ls; i++) {
                printf ":%s", ind[i]
            }
            split(b, a, /:/)
            printf " - %s %s\n", a[2], a[1]
        }
    }
    
    # Found our data marker. Note that and print the current line.
    $1 == "Filter:" {d=1; print; next}
    # Print lines until we see our data marker.
    !d {print; next}
    # Print empty lines.
    !NF {print; next}
    # Save our trailing line for later.
    /===/ {suf=$0; next}
    
    {
        # Save our previous indentation level.
        ls = s
        # Find our new indentation level (by where the first field starts).
        s = (match($0, /[^[:space:]]/)-1) / 2
    
        # If the current line is at or below the last indent level print the last line.
        if (s <= ls) {
            entry()
        }
    
        # Save the current line's byte count.
        b=$NF
        # Save the current line's field name.
        ind[s] = $1
    }
    
    END {
        # Print a final line if we had one.
        entry()
        # Print the suffix line if we have one.
        if (suf) {
            print suf
        }
    }
    

    Which, on the sample input, gets you this output.

    ===================================================================
    Protocol Hierarchy Statistics
    Filter:
    
    eth:ip:icmp - 1308988 bytes
    eth:ip:udp:data - 1346228 bytes
    eth:ip:udp:dns - 1176 bytes
    eth:ip:udp:nbns - 1300 bytes
    eth:ip:udp:http - 1596 bytes
    eth:ip:udp:nbdgm:smb:mailslot:browser - 486 bytes
    eth:ip:tcp:data - 1294800 bytes
    eth:ip:tcp:http:data-text-lines - 324 bytes
    eth:ip:tcp:http:xml:tcp.segments - 787 bytes
    eth:ip:tcp:nbss:smb:pipe:lanman - 686 bytes
    eth:ip:tcp:nbss:smb2 - 2444 bytes
    eth:ip:tcp:bittorrent:tcp.segments:bittorrent:bittorrent - 258 bytes
    eth:ip:tcp:bittorrent:bittorrent:bittorrent - 221 bytes
    eth:arp - 8760 bytes
    eth:ipv6:udp:dns - 1711 bytes
    eth:ipv6:udp:dhcpv6 - 2114 bytes
    eth:ipv6:udp:http - 1014 bytes
    eth:ipv6:udp:data - 1372 bytes
    eth:ipv6:icmpv6:data - 344 bytes
    ===================================================================
    

    Output like what you edited to indicate you want is probably more easily handled with sed though.

    /Filter:/a \
    Protocol                   Bytes \
    =====================================
    s/frames:[^ ]*//
    s/               b/b/
    s/bytes:\([^ ]*\)/\1/
    

    Which ends up with output.

    ===================================================================
    Protocol Hierarchy Statistics
    Filter:
    Protocol                   Bytes
    =====================================
    
    eth                        3984321
      ip                       3969006
        icmp                   1308988
        udp                    1350786
          data                 1346228
          dns                  1176
          nbns                 1300
          http                 1596
          nbdgm                486
            smb                486
              mailslot         486
                browser        486
        tcp                    1309232
          data                 1294800
          http                 3763
            data-text-lines    324
            xml                3205
              tcp.segments     787
          nbss                 5863
            smb                3047
              pipe             686
                lanman         686
            smb2               2444
          bittorrent           1709
            tcp.segments       433
              bittorrent       433
                bittorrent     258
            bittorrent         221
              bittorrent       221
      arp                      8760
      ipv6                     6555
        udp                    6211
          dns                  1711
          dhcpv6               2114
          http                 1014
          data                 1372
        icmpv6                 344
    ===================================================================