java hadoop hdfs hadoop-yarn ioexception

Jobs finishing successfully even though IOException occurs

I receive various IOException on my master node when running the GridMix and I wonder if this is something I should be really concerned about or is it something transient as my jobs are finishing successfully:

IOException: Bad connect ack with firstBadLink: \
java.io.IOException: Bad response ERROR for block BP-49483579-10.0.1.190-1449960324681:blk_1073746606_5783 from datanode 10.0.1.192:50010
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:819)

Solution

I cannot be sure until I understand your complete setup but high possibility is that these exceptions are occurring while appending to pipeline setup, in terms of code you can say that stage == BlockConstructionStage.PIPELINE_SETUP_APPEND.

In any case since your jobs are getting successfully finished you need not to worry, and why it is getting successfully finished is because when trying to open a DataOutputStream to a DataNode pipeline and some exception occurs then it keeps on trying until a pipeline is setup.

The exception occurs from org.apache.hadoop.hdfs.DFSOutputStream, and below are important code snippets for your understanding.

 private boolean createBlockOutputStream(DatanodeInfo[] nodes, long newGS, boolean recoveryFlag) {
    //Code..
    if (pipelineStatus != SUCCESS) {
      if (pipelineStatus == Status.ERROR_ACCESS_TOKEN) {
        throw new InvalidBlockTokenException(
            "Got access token error for connect ack with firstBadLink as "
                + firstBadLink);
      } else {
        throw new IOException("Bad connect ack with firstBadLink as "
            + firstBadLink);
      }
    }
    //Code..
}

Now, createBlockOutputStream is called from setupPipelineForAppendOrRecovery, and as the code comment for this method mentions - "It keeps on trying until a pipeline is setup".

/**
 * Open a DataOutputStream to a DataNode pipeline so that 
 * it can be written to.
 * This happens when a file is appended or data streaming fails
 * It keeps on trying until a pipeline is setup
 */
private boolean setupPipelineForAppendOrRecovery() throws IOException {
    //Code..
    while (!success && !streamerClosed && dfsClient.clientRunning) {
        //Code..
        success = createBlockOutputStream(nodes, newGS, isRecovery);
    }
    //Code..
}

And if you will go through the complete org.apache.hadoop.hdfs.DFSOutputStream code you will understand that pipeline setup trial will keep on going until a pipeline is created for append or fresh use.

If you want to handle it then you can try to adjust dfs.datanode.max.xcievers property from hdfs-site.xml, maximum people have reported solution from the same. Please note that you need to restart your hadoop services after you set the property.

<property>
        <name>dfs.datanode.max.xcievers</name>
        <value>8192</value>
</property>