Search code examples
bashdebiangziptcpdump

Write tcpdump output to compressed / gziped file


I want to write the textual output of tcpdump to a compressed file.

First I tried the most obvious:

# tcpdump -l -i eth0 | gzip -c > test.gz
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
^C63 packets captured
244 packets received by filter
0 packets dropped by kernel
4 packets dropped by interface

# file test.gz
test.gz: empty
# 

Then I found the following solution for Debian 9 (Stretch):

# tcpdump -l -i eth0 | ( gzip -c > test.gz & )
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
^C150 packets captured
160 packets received by filter
0 packets dropped by kernel

# file test.gz 
test.gz: gzip compressed data, last modified: Wed May 23 12:56:16 2018, from Unix
# 

This works fine on Debian 9 (Stretch) but not on Debian 8 (Jessie):

# tcpdump -l -i eth0 | ( gzip -c > test.gz & )
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
tcpdump: Unable to write output: Broken pipe
# 

Two questions:

  1. What's wrong with the 'obvious solution'?
  2. How to capture and zip the tcpdump output in Debian Jessie? (The obvious solution doesn't work there either)

Thanks!


Solution

  • What Was Happening

    To explain what happens here:

    • Ctrl+C sends a SIGINT to the entire process group. That means it doesn't just terminate tcpdump, but also terminates gzip. (The workarounds you were attempting try to avoid this by moving content into background processes, and thus out of the same process group).
    • stdout is line-buffered by default only when output is to a TTY; when output is to a FIFO, it's block-buffered, allowing greater efficiency by writing data from the left-hand process only when a sufficiently larger chunk is available. In many situations, you could thus just use stdbuf -oL or similar to disable this. However...
    • gzip by its nature cannot operate completely unbuffered. This is because block-based compression algorithms need to collect data into, well, blocks; analyze that content in bulk; &c.

    So, if gzip and tcpdump are terminated at the same time, that means there's no assurance that tcpdump will actually be able to flush its output buffer, and then have gzip read, compress and write that flushed data, before gzip itself exits from the signal it received at the same time.


    Fixing The Problem

    Note that the code snippets under headers containing the word "Interactive" are intended for interactive use.


    A Reliable Interactive Workaround (For Bash)

    As a surefire solution, move the gzip completely out-of-band, so it isn't prone to being sent a SIGINT when you press ctrl+c on the tcpdump command:

    exec 3> >(gzip -c >test.gz)  # Make FD 3 point to gzip
    tcpdump -l -i eth0 >&3       # run tcpdump **AS A SEPARATE COMMAND** writing to that fd
    exec 3>&-                    # later, after you cancelled tcpdump, close the FD.
    

    A Reliable Interactive Workaround (For Any POSIX Shell)

    Same thing, but slightly longer and not relying on process substitution:

    mkfifo test.fifo                            # create a named FIFO
    gzip -c <test.fifo >test.gz & gzip_pid="$!" # start gzip, reading from that named FIFO
    tcpdump -l -i eth0 >test.fifo               # start tcpdump, writing to that named FIFO
    rm test.fifo                                # delete the FIFO when done
    wait "$gzip_pid"                            # ...and wait for gzip to exit
    

    Note that the wait will have the exit status of the gzip process, so you can determine whether it encountered an error.


    A Reliable Scripted Workaround (For Any POSIX Shell)

    If we're running a script, then it's appropriate to set up a signal handler so we can handle SIGINT (by killing only tcpdump) explicitly:

    #!/bin/sh
    [ "$#" -gt 0 ] || {
      echo "Usage: ${0##*/} file.tcpdump.gz [tcpdump-args]" >&2
      echo "  Example: ${0##*/} foo.tcpdump.gz -l -i eth0" >&2
      exit 1
    }
    outfile=$1; shift
    fifo=test-$$.fifo # for real code, put this in a unique temporary directory
    
    trap '[ -n "$tcpdump_pid" ] && kill "$tcpdump_pid"' INT
    trap 'rm -f -- "$fifo"' EXIT
    
    rm -f -- "$fifo"; mkfifo "$fifo" || exit
    gzip -c >"$outfile" <"$fifo" & gzip_pid=$!
    
    # avoid trying to run tcpdump if gzip obviously failed to start
    { [ -n "$gzip_pid" ] && [ "$gzip_pid" -gt 0 ] && kill -0 "$gzip_pid"; } || exit 1
    
    tcpdump "$@" >"$fifo" & tcpdump_pid=$!
    
    # return exit status of tcpdump if it fails, or gzip if tcpdump succeeds
    wait "$tcpdump_pid" || wait "$gzip_pid"