Search code examples
logstashlogstash-grokgrok

Separate output values from a single grok query?


I've been capturing web logs using logstash, and specifically I'm trying to capture web URLs, but also split them up.

If I take an example log entry URL: "GET https://www.stackoverflow.com:443/some/link/here.html HTTP/1.1"

I use this grok pattern:

\"(?:%{NOTSPACE:http_method}|-)(?:%{SPACE}http://)?(?:%{SPACE}https://)?(%{NOTSPACE:http_site}:)?(?:%{NUMBER:http_site_port:int})?(?:%{GREEDYDATA:http_site_url})? (?:%{WORD:http_type|-}/)?(?:%{NOTSPACE:http_version:float})?(?:%{SPACE})?\"

I get this:

{
  "http_method": [
    [
      "GET"
    ]
  ],
  "SPACE": [
    [
      " ",
      null,
      ""
    ]
  ],
  "http_site": [
    [
      "www.stackoverflow.com"
    ]
  ],
  "BASE10NUM": [
    [
      "443"
    ]
  ],
  "http_site_url": [
    [
      "/some/link/here.html"
    ]
  ],
  "http_type": [
    [
      "HTTP"
    ]
  ]
}

The trouble is, I'm trying to ALSO capture the entire URL: https://www.stackoverflow.com:443/some/link/here.html

So in total, I'm seeking 4 separate outputs:

http_site_complete https://www.stackoverflow.com:443/some/link/here.html

http_site www.stackoverflow.com

http_site_port 443

http_site_url /some/link/here.html

Is there some way to do this?


Solution

  • First, look at the built-in patterns for dealing with URLs. Putting something like URIHOST in your pattern will be easier to read and maintain that a bunch od WORDs or NOTSPACEs.

    Second, once you have lots of little fields, you can always use logstash's filters to manipulate them. You could use:

     mutate {
         add_field => { "http_site_complete", "%{http_site}:%{http_site_port}%{http_site_url}" }
         }
     }
    

    Or you could get fancy with your regexp and use a named group:

    (?<total>%{WORD:wordOne} %{WORD:wordTwo} %{WORD:wordThree})
    

    which would individually capture three fields and make one more field from the whole string.