Search code examples
javahadoophdfshadoop-yarnioexception

Jobs finishing successfully even though IOException occurs


I receive various IOException on my master node when running the GridMix and I wonder if this is something I should be really concerned about or is it something transient as my jobs are finishing successfully:

IOException: Bad connect ack with firstBadLink: \
java.io.IOException: Bad response ERROR for block BP-49483579-10.0.1.190-1449960324681:blk_1073746606_5783 from datanode 10.0.1.192:50010
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:819)

Solution

  • I cannot be sure until I understand your complete setup but high possibility is that these exceptions are occurring while appending to pipeline setup, in terms of code you can say that stage == BlockConstructionStage.PIPELINE_SETUP_APPEND.

    In any case since your jobs are getting successfully finished you need not to worry, and why it is getting successfully finished is because when trying to open a DataOutputStream to a DataNode pipeline and some exception occurs then it keeps on trying until a pipeline is setup.

    The exception occurs from org.apache.hadoop.hdfs.DFSOutputStream, and below are important code snippets for your understanding.

     private boolean createBlockOutputStream(DatanodeInfo[] nodes, long newGS, boolean recoveryFlag) {
        //Code..
        if (pipelineStatus != SUCCESS) {
          if (pipelineStatus == Status.ERROR_ACCESS_TOKEN) {
            throw new InvalidBlockTokenException(
                "Got access token error for connect ack with firstBadLink as "
                    + firstBadLink);
          } else {
            throw new IOException("Bad connect ack with firstBadLink as "
                + firstBadLink);
          }
        }
        //Code..
    }
    

    Now, createBlockOutputStream is called from setupPipelineForAppendOrRecovery, and as the code comment for this method mentions - "It keeps on trying until a pipeline is setup".

    /**
     * Open a DataOutputStream to a DataNode pipeline so that 
     * it can be written to.
     * This happens when a file is appended or data streaming fails
     * It keeps on trying until a pipeline is setup
     */
    private boolean setupPipelineForAppendOrRecovery() throws IOException {
        //Code..
        while (!success && !streamerClosed && dfsClient.clientRunning) {
            //Code..
            success = createBlockOutputStream(nodes, newGS, isRecovery);
        }
        //Code..
    }
    

    And if you will go through the complete org.apache.hadoop.hdfs.DFSOutputStream code you will understand that pipeline setup trial will keep on going until a pipeline is created for append or fresh use.

    If you want to handle it then you can try to adjust dfs.datanode.max.xcievers property from hdfs-site.xml, maximum people have reported solution from the same. Please note that you need to restart your hadoop services after you set the property.

    <property>
            <name>dfs.datanode.max.xcievers</name>
            <value>8192</value>
    </property>