Search code examples
timeoutlogbackdropwizardrollingfileappender

Dropwizard logback async log rotation causing application threads to wait


The application threads get stuck whenever the log file rotates, this causes spike in latency of the API I'm using an Async Appender, not sure why during rotation the application threads are waiting.

logback.xml

<configuration debug="true">
<property name="async.discardingThreshold" value="0"/>
<property name="async.queueSize" value="500"/>
<property name="log.dir" value="/var/log"/>
<property name="log.pattern" value="%highlight(%-5level) [%date] [%thread] [%X{id}] [%cyan(%logger{0})]: %message%n"/>
<property name="errorLog.pattern" value="%highlight(%-5level) [%date] [%thread] [%X{id}] [%red(%logger{0})]: %message%n"/>
<property name="log.maxHistory" value="200"/>
<property name="log.default.maxFileSize" value="100MB"/>
<property name="log.error.maxFileSize" value="10MB"/>


<appender name="INFO" class="ch.qos.logback.core.rolling.RollingFileAppender">
    <File>${log.dir}/default.log</File>
    <Append>true</Append>
    <rollingPolicy class="ch.qos.logback.core.rolling.FixedWindowRollingPolicy">
        <fileNamePattern>${log.dir}/default.%i.log.gz</fileNamePattern>
        <maxIndex>${log.maxHistory}</maxIndex>
    </rollingPolicy>
    <triggeringPolicy class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy">
        <maxFileSize>${log.default.maxFileSize}</maxFileSize>
    </triggeringPolicy>
    <encoder>
        <pattern>%replace(${log.pattern}){'"pin":"\d+"','"pin":"XXXX"'}%n</pattern>
    </encoder>
</appender>

<appender name="ASYNC-INFO" class="ch.qos.logback.classic.AsyncAppender">
    <discardingThreshold>${async.discardingThreshold}</discardingThreshold>
    <queueSize>${async.queueSize}</queueSize>
    <filter class="ch.qos.logback.core.filter.EvaluatorFilter">
        <OnMismatch>DENY</OnMismatch>
        <OnMatch>NEUTRAL</OnMatch>
    </filter>
    <appender-ref ref="INFO"/>
</appender>

<appender name="ERROR" class="ch.qos.logback.core.rolling.RollingFileAppender">
    <file>${log.dir}/error.log</file>
    <rollingPolicy class="ch.qos.logback.core.rolling.FixedWindowRollingPolicy">
        <fileNamePattern>${log.dir}/error.%i.log.gz</fileNamePattern>
        <maxIndex>${log.maxHistory}</maxIndex>
    </rollingPolicy>
    <triggeringPolicy class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy">
        <maxFileSize>${log.error.maxFileSize}</maxFileSize>
    </triggeringPolicy>
    <encoder>
        <pattern>%replace(${errorLog.pattern}){'"pin":"\d+"','"pin":"XXXX"'}%n</pattern>
    </encoder>
</appender>

<appender name="ASYNC-ERROR" class="ch.qos.logback.classic.AsyncAppender">
    <discardingThreshold>${async.discardingThreshold}</discardingThreshold>
    <queueSize>${async.queueSize}</queueSize>
    <filter class="ch.qos.logback.classic.filter.ThresholdFilter">
        <level>ERROR</level>
    </filter>

    <appender-ref ref="ERROR"/>
</appender>


<root level="INFO">
    <appender-ref ref="ASYNC-ERROR"/>
    <appender-ref ref="ASYNC-INFO"/>
</root>


Solution

  • In our logback.xml we have specified,

    <property name="async.discardingThreshold" value="0"/>
    

    Now a quick look at the source code suggests what possibly might be happening that is causing the delay at the time of rotation

     @Override
      protected void append(E eventObject) {
        if (isQueueBelowDiscardingThreshold() && isDiscardable(eventObject)) {
          return;
        }
        preprocess(eventObject);
        put(eventObject);
      }
      private boolean isQueueBelowDiscardingThreshold() {
        return (blockingQueue.remainingCapacity() < discardingThreshold);  
      }
    

    blockingQueue.remainingCapacity() < discardingThreshold, this condition will never evaluate to true if discarding threshold is 0, what that means is the async-appender thread will try to push to an already full blocking queue, hence, will park itself and wait on it causing the application thread to wait as well.

    Setting this value to anything above 0, is causing no timeouts, however, some events might get lost.

    The other option to keep all the events without any discard would be to increase the queue size as much that at the instant of file rotation, there are no more than the size of the queue element in the queue. In which case, the async-appender thread won't wait on the blocking queue.

    So my finding is logback AsyncAppender is NOT so Async if message incoming rate exceeds the queue consumption rate and discarding rate is 0.