Logstash's grok is a string parsing tool which built on top of regex, it provides many patterns that make string parsing jobs so much easier, I just fell in love with it the first time I used it. But unfortunately, it's written in Ruby, makes it impossible to be used in my Python projects, so I'm wondering is there any Python implementation of grok, or is there any Python alternative that can simplify string parsing like grok do?
I'm not aware on any python ports of grok, but this functionality seems pretty straightforward to implement:
import re
types = {
'WORD': r'\w+',
'NUMBER': r'\d+',
# todo: extend me
}
def compile(pat):
return re.sub(r'%{(\w+):(\w+)}',
lambda m: "(?P<" + m.group(2) + ">" + types[m.group(1)] + ")", pat)
rr = compile("%{WORD:method} %{NUMBER:bytes} %{NUMBER:duration}")
print re.search(rr, "hello 123 456").groupdict()
# {'duration': '456', 'bytes': '123', 'method': 'hello'}