I receive various IOException on my master node when running the GridMix and I wonder if this is something I should be really concerned about or is it something transient as my jobs are finishing successfully:
IOException: Bad connect ack with firstBadLink: \
java.io.IOException: Bad response ERROR for block BP-49483579-10.0.1.190-1449960324681:blk_1073746606_5783 from datanode 10.0.1.192:50010
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:819)
I cannot be sure until I understand your complete setup but high possibility is that these exceptions are occurring while appending to pipeline setup, in terms of code you can say that stage == BlockConstructionStage.PIPELINE_SETUP_APPEND
.
In any case since your jobs are getting successfully finished you need not to worry, and why it is getting successfully finished is because when trying to open a DataOutputStream to a DataNode pipeline and some exception occurs then it keeps on trying until a pipeline is setup.
The exception occurs from org.apache.hadoop.hdfs.DFSOutputStream
, and below are important code snippets for your understanding.
private boolean createBlockOutputStream(DatanodeInfo[] nodes, long newGS, boolean recoveryFlag) {
//Code..
if (pipelineStatus != SUCCESS) {
if (pipelineStatus == Status.ERROR_ACCESS_TOKEN) {
throw new InvalidBlockTokenException(
"Got access token error for connect ack with firstBadLink as "
+ firstBadLink);
} else {
throw new IOException("Bad connect ack with firstBadLink as "
+ firstBadLink);
}
}
//Code..
}
Now, createBlockOutputStream
is called from setupPipelineForAppendOrRecovery
, and as the code comment for this method mentions - "It keeps on trying until a pipeline is setup".
/**
* Open a DataOutputStream to a DataNode pipeline so that
* it can be written to.
* This happens when a file is appended or data streaming fails
* It keeps on trying until a pipeline is setup
*/
private boolean setupPipelineForAppendOrRecovery() throws IOException {
//Code..
while (!success && !streamerClosed && dfsClient.clientRunning) {
//Code..
success = createBlockOutputStream(nodes, newGS, isRecovery);
}
//Code..
}
And if you will go through the complete org.apache.hadoop.hdfs.DFSOutputStream
code you will understand that pipeline setup trial will keep on going until a pipeline is created for append or fresh use.
If you want to handle it then you can try to adjust dfs.datanode.max.xcievers
property from hdfs-site.xml
, maximum people have reported solution from the same. Please note that you need to restart your hadoop services after you set the property.
<property>
<name>dfs.datanode.max.xcievers</name>
<value>8192</value>
</property>