Searched around a bit, but I only found cases where splitting by comma's or so would work. This case is different.
To explain my problem I'll show a tiny example:
JAN 01 00:00:01 <Admin> Action, May have spaces etc.
(This is a log entry)
I'd like to parse this string into several variables. The first bit is obviously a date, without year. Between the <>'s the login name is listed, and behind the log entry.
The configuration should have something like this:
{month} {day} {hour}:{minute}:{second} <{login}> {the_rest}
This will allow changes without having the whole thing hardcoded (using splits etc).
I think using Regex may be useful here, but I do not really know a lot about it and if it'd be usable in this case at all. Speed does not matter a lot, yet I don't really know how to achieve this.
Thanks,
~Tgys
Regular expressions are indeed the correct tool here. First, let's see how you can use a hardcoded regular expression to parse this log.
var str = "JAN 01 00:00:01 <Admin> Action, May have spaces etc.";
var re = new Regex("^" +
@"(?<month>(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC))" +
" " +
@"(?<day>\d+)" +
" " +
@"(?<hour>\d+)" +
":" +
@"(?<the_rest>.*)" +
"$");
var match = re.Match(str);
What we did here is create a regular expression piece-by-piece using named capturing groups. I didn't capture all the relevant information for brevity, and I didn't spend too much time in considering what is valid input in the context of each group (e.g. day
will match 999
, although that's not a valid day). All this can come later; for now, see it in action.
The next step is to nicely pull out the definition of each capturing group into a dictionary:
var groups = new Dictionary<string, string>
{
{ "month", "(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)" },
{ "day", @"\d+" },
{ "hour", @"\d+" },
{ "the_rest", ".*" },
};
Given this, we can now construct the same regex with
var re = new Regex("^" +
string.Format("(?<month{0}>)", groups["month"]) +
" " +
string.Format("(?<day{0}>)", groups["day"]) +
" " +
string.Format("(?<hour{0}>)", groups["hour"]) +
":" +
string.Format("(?<the_rest{0}>)", groups["the_rest"]) +
"$");
OK, this is starting to look like something that can be constructed dynamically.
Let's say we want to construct it from a specification that looks like
"{month} {day} {hour}:{the_rest}"
How to do this? With another regular expression! Specifically, we will use the overload of Regex.Replace
that enables replacement of a match with the result of a function:
var format = "{month} {day} {hour}:{the_rest}";
var result = Regex.Replace(format, @"\{(\w+)\}", m => groups[m.Groups[1].Value]);
See this in action before coming back.
At this point, we can pass in a format specification and get back a regular expression that matches the input based on this format. What's left? To translate the results of matching the regular expression to the input back to a "dynamic" structure:
var format = "{month} {day} {hour}:{the_rest}";
var re = Regex.Replace(format,
@"\{(\w+)\}",
m => string.Format("(?<{0}>{1})", m.Groups[1].Value, groups[m.Groups[1].Value]));
var regex = new Regex("^" + re + "$", RegexOptions.ExplicitCapture);
var match = regex.Match(str);
At this point:
match.Success
to see if the dynamically constructed expression matches the inputregex.GetGroupNames()
to get the names of the groups used in parsingmatch.Groups
to get the results of parsing each groupSo let's put them in a dictionary:
var results = regex.GetGroupNames().ToDictionary(n => n, n => match.Groups[n].Value);
You can now create a method Parse
that allows this:
var input = "JAN 01 00:00:01 <Admin> Action, May have spaces etc.";
var format = "{month} {day} {hour}:{the_rest}";
var results = Parse(input, format);
Parse
will recognize (but not allow the user to modify) expressions such as "{month}"
, while at the same time allowing the user to mix and match these expressions freely in order to parse the input.