I'm trying to extract a substring from my request_uri field in logstash. Grok splits my apace access-log line into several field (is already working) so I get the request_uri in its own field. Now I want to get the root context of the uri.
/en/some/stuff
/ApplicationName/some/path
/fr/some/french/stuff
But I don#t know how to store en, ApplicationName, fr in its own field (additional to the others). I'm thinking something like this might work.
grok {
pattern => "\"%{GREEDYDATA:domain}\" - %{IP:client_ip} \[%{GREEDYDATA:log_timestamp}\] \"%{WORD:method}\" \"%{GREEDYDATA:request_uri}\" - \"%{GREEDYDATA:query_string}\" - \"%{GREEDYDATA:protocol}\" - %{NUMBER:http_statuscode} %{NUMBER:bytes} \"%{GREEDYDATA:user_agent}\" %{NUMBER:seconds} %{NUMBER:milliseconds} \"%{GREEDYDATA:server_node}\""
match => [ "new_context_field", "SOME-REGEX fo parse request_uri" ]
}
Can you give me a hint?
Thanks for your help. Solved it with this grok config which is pretty similar to your suggestion.
grok {
patterns_dir => "/path/to/elk-stack/logstash-1.4.2/bin/custom_patterns"
match => [ "message", "\"%{GREEDYDATA:domain}\" - %{IP:client_ip} \[%{GREEDYDATA:log_timestamp}\] \"%{WORD:method}\" \"%{GREEDYDATA:request_uri}\" - \"%{GREEDYDATA:query_string}\" - \"%{GREEDYDATA:protocol}\" - %{NUMBER:http_statuscode} %{NUMBER:bytes} \"%{GREEDYDATA:user_agent}\" %{NUMBER:seconds} %{NUMBER:milliseconds} \"%{GREEDYDATA:server_node}\""]
match => [ "request_uri", "%{CONTEXTFROMURI:context}" ]
break_on_match => false
}
To use multiple matches in a single grok block make sure to include break_on_match => false
. Otherwise the second match is skipped if first one is successful.