What would be the most efficient way to grab the destination ip (>ip) and the "User-Agent:". I would like to grab those two values and dump them into a file with ip first in line followed by user agent. I would like to minimize system resources. This will be running 24x7 with flushing the log periodically.
" > Flags [P.], cksum 0x2a6c (correct), seq 1:431, ack 1, win 17520, length 430
E.../y@.~...K...b....m.Px9.Iim/.P.Dp*l..GET /images/40eb913b4b20614fad042dc816d412fe_48.jpeg HTTP/1.1^M
Accept: image/png, image/svg+xml, image/*;q=0.8, */*;q=0.5^M
Referer: http://sports.yahoo.com/news/nascar--scene-at-daytona--was-like-a-war-zone--005423629.html^M
Accept-Language: en-US^M
User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)^M
Accept-Encoding: gzip, deflate^M
Host: socialprofiles.zenfs.com^M
DNT: 1^M
Connection: Keep-Alive^M"
Adding URL for well formatted original output. docs.google.com/file/d/0B1umMHxdWKkdNzI3anBaemhuOVE/edit?usp=sharing
T -> [AP]
GET /posts/15048809/ivc/29bb?_=1362111021654 HTTP/1.1.
Host: stackoverflow.com.
Connection: keep-alive.
Accept: */*.
X-Requested-With: XMLHttpRequest.
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.22 (KHTML, like Gecko) Chrome/25.0.1364.97 Safari/537.22.
Referer: http://stackoverflow.com/questions/15048809/tcpdump-header-info-grep-or-awk- or-sed.
Accept-Encoding: gzip,deflate,sdch.
Accept-Language: en-GB,en-US;q=0.8,en;q=0.6.
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3.
Cookie: __qca=P0-1081552782-122603326; sgt=id=9
It will help answer your question exactly, if you include in your problem description an exact sample of your required output. Until then, here is a general idea how to proceed.
$ awk '/Flags/{sub(/.80:/, "", $4);printf $4"\t"} /User-Agent/{sub(/^[^:][^:]*:/,"");sub(/\.80/,"", $4); print}' logTest
output Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)^M
I'm leaving the field matched as $4 as that is what matches my rendition of your sample data, per your comments, you can change it easily to $3.
Note I've used a tab as the field separator between the IP and the User-Agent.