Search code examples
regexnetwork-programmingtcpdump

Having trouble parsing tcpdump output with regex


In particular, I'm trying to get the "Host: ..." part of an HTTP header of the HTTP request packet.

One instance is something like this:

.$..2~.:Ka3..E..D'.@.@..M....J}.e...P...q...W................g.o3GET./.HTTP/1.1...$..2~.:Ka3..E..G'.@.@..I....J}.e...P.......W................g..\host:.domain.com..

Another is this:

.$..2~.:Ka3..E..D'.@.@..M....J}.e...P...q...W................g.o3GET./.HTTP/1.1...$..2~.:Ka3..E..G'.@.@..I....J}.e...P.......W................g..\host:.domain.com..Connection:.Keep-Alive....

Note this is the ascii output. I want to extract that host. My initial regex was:

[hH]ost:\.(.*)..

This works for the first case, but does not work for the second one. In particular, for the second one it will extract: "domain.com..Connection.Keep-Alive.."

I would appreciate some help with creating a general regex that works in all cases.


Solution

  • Use this:

    (?<=host:\.)(?:\.?[^.])+
    

    See demo

    • The lookbehind (?<=host:\.) asserts that what precedes is host:.
    • (?:\.?[^.]) matches an optional period, then one character that is not a period.
    • And the + makes us match one or more of these characters

    Reference