Search code examples
regexapachelog4jflume

Flume Regex Filtering Interceptor is not working as expected


I am trying to implement a simple Flume test app, like one from the Flume User Guide, except that I want to use a log4j as a source and accept the logs that match some regexp. So I have implemented a random log generator and configured log4j and flume like this:

log4j.properties

# Root logger option
log4j.rootLogger=ALL, stdout
#
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Target=System.out
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L -     %m%n
#
log4j.appender.flume = org.apache.flume.clients.log4jappender.Log4jAppender
log4j.appender.flume.Hostname = localhost
log4j.appender.flume.Port = 41414

# configure a class's logger to output to the flume appender
log4j.logger.generator.LogGenerator = INFO,flume

log4j.appender.flume.layout=org.apache.log4j.PatternLayout
log4j.appender.flume.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n

secondExample.conf

# Flume test file
# Listens via Avro RPC on port 41414 and dumps data received to the log

agent.channels = ch-1
agent.sources = src-1
agent.sinks = sink-1

agent.channels.ch-1.type = memory
agent.channels.ch-1.capacity = 10000000
agent.channels.ch-1.transactionCapacity = 1000

agent.sources.src-1.type = avro
agent.sources.src-1.channels = ch-1
agent.sources.src-1.bind = localhost
agent.sources.src-1.port = 41414

agent.sources.src-1.interceptors = intrcptr
agent.sources.src-1.interceptors.intrcptr.type = regex_filter
agent.sources.src-1.interceptors.intrcptr.regex = "ERROR [0-4]:"

agent.sinks.sink-1.type = logger
agent.sinks.sink-1.channel = ch-1

sample generated logs:

2013-11-18 15:27:06 ERROR LogGenerator:33 - ERROR 3: 68290a60-8c25-494d-9d0d-4361a01f065f
2013-11-18 15:27:35 WARN  LogGenerator:33 - ERROR 2: 154c4779-ad6a-4b10-9ba7-199a02ad7554
2013-11-18 15:28:49 WARN  LogGenerator:33 - ERROR 5: a2a94b78-e387-4937-b6b3-c480e2c7ea76
2013-11-18 15:29:35 FATAL LogGenerator:33 - ERROR 6: 49baaa6b-19cb-48c8-9f92-b7a75f8d04dc

The problem is that Regex Filtering Interceptor does not exclude any events, and logger sink logs all of the generated log messages. I have found this source code, and wrote a small test with generated logs and slightly modified intercept method (so that it takes and returns Strings, not Events) and it works as expected.

I am really confused right now and tend to think that this is a Flume bug. Any help will be appreciated.

P.S. I am using "apache-flume-1.4.0-bin" flume.


Solution

  • This seems to be a Flume issue. It is reproducible if Flume is configured like this:

    Log4jAppender -> Avro source -> Regex interceptor -> Logger Sink

    The workaround is to configure Flume with two agents like this:

    Log4jAppender -> Avro source -> Avro sink -> Avro source -> Regex interceptor -> Logger Sink