Search code examples
logstashlogstash-grok

Logstash/Grok: Read substring from field using regex


I'm trying to extract a substring from my request_uri field in logstash. Grok splits my apace access-log line into several field (is already working) so I get the request_uri in its own field. Now I want to get the root context of the uri.

/en/some/stuff
/ApplicationName/some/path
/fr/some/french/stuff

But I don#t know how to store en, ApplicationName, fr in its own field (additional to the others). I'm thinking something like this might work.

grok {
            pattern => "\"%{GREEDYDATA:domain}\" - %{IP:client_ip} \[%{GREEDYDATA:log_timestamp}\] \"%{WORD:method}\" \"%{GREEDYDATA:request_uri}\" - \"%{GREEDYDATA:query_string}\" - \"%{GREEDYDATA:protocol}\" - %{NUMBER:http_statuscode} %{NUMBER:bytes} \"%{GREEDYDATA:user_agent}\" %{NUMBER:seconds} %{NUMBER:milliseconds} \"%{GREEDYDATA:server_node}\""
            match => [ "new_context_field", "SOME-REGEX fo parse request_uri" ]
        }

Can you give me a hint?


Solution

  • Thanks for your help. Solved it with this grok config which is pretty similar to your suggestion.

    grok {
        patterns_dir => "/path/to/elk-stack/logstash-1.4.2/bin/custom_patterns"
    
        match => [ "message", "\"%{GREEDYDATA:domain}\" - %{IP:client_ip} \[%{GREEDYDATA:log_timestamp}\] \"%{WORD:method}\" \"%{GREEDYDATA:request_uri}\" - \"%{GREEDYDATA:query_string}\" - \"%{GREEDYDATA:protocol}\" - %{NUMBER:http_statuscode} %{NUMBER:bytes} \"%{GREEDYDATA:user_agent}\" %{NUMBER:seconds} %{NUMBER:milliseconds} \"%{GREEDYDATA:server_node}\""]
        match => [ "request_uri", "%{CONTEXTFROMURI:context}" ]
    
        break_on_match => false
    }
    

    To use multiple matches in a single grok block make sure to include break_on_match => false. Otherwise the second match is skipped if first one is successful.