Search code examples
javascalapattern-matchinglogginglogparser

parse log strings into usable parts


Some context

I'm writing an application that will be fed a continuous stream of strings/data. The data are log messages but from different machines and different applications. So their format may be slightly different.

My aim is to get individual components from each message and regardless of the source try to normalize the data in some way so that common parts such as, host,thread,time,message and level.

Questions

  1. Does log4j have any support for something like this? i.e. taking a string returning an object of some sort that can be used to get the part mentioned before?
  2. If not, are there any libraries available to do this or something similar?
  3. Ideally I'd like it if I could provide multiple patterns for it to match and a fall back that is used by default if none of the other patterns matched. Anything like this?

I realize things like Awstats do log parsing and what not but in this case my only two options are to use a library that does it or write something and I'd rather not reinvent the wheel.


Solution

  • You could use parser combinators for that. E.g. this parses a tuple of integers:

    import scala.util.parsing.combinator.RegexParsers
    
    object Parser extends RegexParsers {
      def intPair = INT ~ "," ~ INT ^^ { x => (x._1._1.toInt,x._2.toInt) }
      val INT = "[0-9]+".r
    }
    
    
    Parser.parseAll(Parser.intPair, "10,22") // => (10,22)
    

    Here is a good starting point: http://www.codecommit.com/blog/scala/the-magic-behind-parser-combinators