Some context
I'm writing an application that will be fed a continuous stream of strings/data. The data are log messages but from different machines and different applications. So their format may be slightly different.
My aim is to get individual components from each message and regardless of the source try to normalize the data in some way so that common parts such as, host,thread,time,message and level.
Questions
I realize things like Awstats do log parsing and what not but in this case my only two options are to use a library that does it or write something and I'd rather not reinvent the wheel.
You could use parser combinators for that. E.g. this parses a tuple of integers:
import scala.util.parsing.combinator.RegexParsers
object Parser extends RegexParsers {
def intPair = INT ~ "," ~ INT ^^ { x => (x._1._1.toInt,x._2.toInt) }
val INT = "[0-9]+".r
}
Parser.parseAll(Parser.intPair, "10,22") // => (10,22)
Here is a good starting point: http://www.codecommit.com/blog/scala/the-magic-behind-parser-combinators