hadoop logging garbage-collection diskspace namenode

How to enable GC logging for Apache Hadoop NameNodes, while preventing log file overwrites and capping disk space usage

We recently decided to enable GC logging for Hadoop NameNodes on a number of clusters (exact version varies) as a aid to looking into NameNode-related memory and garbage collection problems. We want this for all NameNodes in the cluster (active and standby or primary and secondary). We also want to avoid two problems we know might happen:

overwriting of the log file when a NameNode restarts for any reason
the logs using too much disk space, leading to disks getting filled

When Java GC logging starts for a process it seems to replace the content of any file that has the same name. This means that unless you are careful, you will lose the GC logging, perhaps when you are more likely to need it.

If you keep the cluster running long enough, log files will fill up disk unless managed. Even if GC logging is not currently voluminous we want to manage the risk of an unusual situation arising that causes the logging rate to spike up.

Solution

There are a couple choices for environment variables that can be set in hadoop-env.sh to pass JVM parameters for NameNode when it is starting:

HADOOP_OPTS (applies to all client and server invocations started with "hadoop", even "hadoop dfs")
HADOOP_NAMENODE_OPTS (with NameNode HA this affects active and standby NameNodes and without NameNode HA affects primary NameNode)
HADOOP_SECONDARYNAMENODE_OPTS (affects secondary NameNode, if there is no NameNode HA)

The question is for NameNode specifically, so we would want to add JVM parameters to HADOOP_NAMENODE_OPTS and HADOOP_SECONDARYNAMENODE_OPTS (where relevant). Now lets discuss the JVM parameters to include there.

To enable GC logging to a file, you will need to add -verbose:gc -Xloggc:<log-file-location>.

You need to give the log file name special consideration to prevent overwrites whenever the NameNode is restarted. It seems like you need to have a unique name for every invocation so appending a timestamp seems like the best option. You can include something like `date +'%Y%m%d%H%M'` to add a timestamp. In this example, it is in the form of YYYYMMDDHHMM. In some versions of Java you can put "%t" in your log file location and it will be replaced by the NameNode start up timestamp formatted as YYYY-MM-DD_HH-MM-SS.

Now onto managing use of disk space. I'll be happy if there is a simpler way than what I have.

First, take advantage of Java's built-in GC log file rotation. -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M is an example of enabling this rotation, having up to 10 GC log files from the JVM, each of which is no more than 10MB in size. 10 x 10MB is 100MB max usage.

With the GC log file rotation in place with up to 10 files, '.0', '.1', ... '.9' will be added to the file name you gave in Xloggc. .0 will be first and after it reaches .9 it will replace .0 and continue on in a round robin manner. In some versions of Java '.current' will be additionally put on the end of the name of the log file currently being written to.

Due to the unique file naming we apparently have to have to avoid overwrites, you can have 100MB per NameNode process invocation, so this is not a total solution to managing disk space used by NameNode GC logs. You will end up with a set of up to 10 GC log files on each NameNode server -- this can add up over time. The best solution (under *nix) to that would seem to be to use the logrotate utility (or some other utility) to periodically clean up NameNode GC logs that have not been modified in the last N days.

Be sure to do the math and make sure you will have enough disk space.

People frequently want more details and context in their GC logs than the default, so consider adding in -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps.

Putting this together, you might add something this to hadoop-env:

TIMESTAMP=`date +'%Y%m%d%H%M'`
# location/name prior to .n addition by log rotation
NAMENODE_GC_LOG_NAME="{{hdfs_log_dir_prefix}}/$USER/hdfs-namenode-gc.log-$TIMESTAMP"

NAMENODE_GC_LOG_ENABLE_OPTS="-verbose:gc -Xloggc:$NAMENODE_GC_LOG_NAME"
NAMENODE_GC_LOG_ROTATION_OPTS="-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M"
NAMENODE_GC_LOG_FORMAT_OPTS="-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps"

NAMENODE_GC_LOG_OPTS="$NAMENODE_GC_LOG_ENABLE_OPTS $NAMENODE_GC_LOG_ROTATION_OPTS $NAMENODE_GC_LOG_FORMAT_OPTS"

That should be inserted somewhere above where HADOOP_NAMENODE_OPTS and HADOOP_SECONDARYNAMENODE_OPTS are set. Then add $HADOOP_NAMENODE_OPTS onto HADOOP_NAMENODE_OPTS and HADOOP_SECONDARYNAMENODE_OPTS (if present) in place of any existing GC-logging related parameters.

In the above, you can change {{hdfs_log_dir_prefix}}/$USER to wherever you want the GC logs to go (you probably want it to go the the same place as NameNode logs). You can change the log file naming too.

If you are managing your Hadoop cluster with Apache Ambari, then these changes would be in HDFS service > Configs > Advanced > Advanced hadoop-env > hadoop-env template. With Ambari, {{hdfs_log_dir_prefix}} will be automatically replaced with the Hadoop Log Dir Prefix defined a few rows above the field.

GC logging will start happening upon NameNode restart. If you have NameNode HA and both NameNodes are online, then you can restart one at a time without any downtime.