Search code examples
linuxhadoopamazon-ec2client-serverconnection-refused

Node Manager cannot able to start in Hadoop 2.6.0 (Connection refused)


I have installed hadoop 2.6.0 multi-node cluster in EC2 instance(ubuntu 14.04 64 bit). All demons(NameNode,SecondaryNameNode,ResourceManager) in master is up,but in slave machine only DataNode is up NodeManager is shutting down due to connection refuse.

Kindly help me in this regard. Thanks in advance

The log file of my NodeManager is below:

2015-09-08 07:59:36,606 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: NodeManager configured with 8 G physical memory allocated to containers, which is more than 80% of the total physical memory available (992.5 M). Thrashing might happen.
2015-09-08 07:59:36,613 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Initialized nodemanager for null: physical-memory=8192 virtual-memory=17204 virtual-cores=8
2015-09-08 07:59:36,646 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue
2015-09-08 07:59:36,666 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 53949
2015-09-08 07:59:36,688 INFO org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.yarn.api.ContainerManagementProtocolPB to the server
2015-09-08 07:59:36,688 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Blocking new container-requests as container manager rpc server is still starting.
2015-09-08 07:59:36,691 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2015-09-08 07:59:36,692 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 53949: starting
2015-09-08 07:59:36,707 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager: Updating node address : ec2-52-88-167-9.us-west-2.compute.amazonaws.com:53949
2015-09-08 07:59:36,713 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue
2015-09-08 07:59:36,713 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 8040
2015-09-08 07:59:36,716 INFO org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB to the server
2015-09-08 07:59:36,717 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2015-09-08 07:59:36,717 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8040: starting
2015-09-08 07:59:36,717 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Localizer started on port 8040
2015-09-08 07:59:36,719 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: ContainerManager started at ec2-52-88-167-9.us-west-2.compute.amazonaws.com/172.31.29.154:53949
2015-09-08 07:59:36,719 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: ContainerManager bound to 0.0.0.0/0.0.0.0:0
2015-09-08 07:59:36,719 INFO org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer: Instantiating NMWebApp at 0.0.0.0:8042
2015-09-08 07:59:36,790 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2015-09-08 07:59:36,793 INFO org.apache.hadoop.http.HttpRequestLog: Http request log for http.requests.nodemanager is not defined
2015-09-08 07:59:36,805 INFO org.apache.hadoop.http.HttpServer2: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
2015-09-08 07:59:36,806 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context node
2015-09-08 07:59:36,806 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static
2015-09-08 07:59:36,807 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs
2015-09-08 07:59:36,812 INFO org.apache.hadoop.http.HttpServer2: adding path spec: /node/*
2015-09-08 07:59:36,812 INFO org.apache.hadoop.http.HttpServer2: adding path spec: /ws/*
2015-09-08 07:59:36,820 INFO org.apache.hadoop.http.HttpServer2: Jetty bound to port 8042
2015-09-08 07:59:36,820 INFO org.mortbay.log: jetty-6.1.26
2015-09-08 07:59:36,863 INFO org.mortbay.log: Extract jar:file:/home/ubuntu/hadoop/hadoop-2.6.0/share/hadoop/yarn/hadoop-yarn-common-2.6.0.jar!/webapps/node to /tmp/Jetty_0_0_0_0_8042_node____19tj0x/webapp
2015-09-08 07:59:37,358 INFO org.mortbay.log: Started HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:8042
2015-09-08 07:59:37,359 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app /node started at 8042
2015-09-08 07:59:37,879 INFO org.apache.hadoop.yarn.webapp.WebApps: Registered webapp guice modules
2015-09-08 07:59:37,885 INFO org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8031
2015-09-08 07:59:37,913 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out 0 NM container statuses: []
2015-09-08 07:59:37,917 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registering with RM using containers :[]
**2015-09-08 07:59:38,951 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2015-09-08 07:59:39,956 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2015-09-08 07:59:40,957 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2015-09-08 07:59:41,957 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2015-09-08 07:59:42,958 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)**

2015-09-08 08:19:48,256 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl failed in state STARTED; cause: **org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.ConnectException: Call From ec2-52-88-167-9.us-west-2.compute.amazonaws.com/172.31.29.154 to 0.0.0.0:8031 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.ConnectException: Call From ec2-52-88-167-9.us-west-2.compute.amazonaws.com/172.31.29.154 to 0.0.0.0:8031 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused**
        at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:197)
        at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
        at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:264)
        at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:463)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:509)
Caused by: java.net.ConnectException: Call From ec2-52-88-167-9.us-west-2.compute.amazonaws.com/172.31.29.154 to 0.0.0.0:8031 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
        at sun.reflect.GeneratedConstructorAccessor8.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
        at org.apache.hadoop.ipc.Client.call(Client.java:1472)
        at org.apache.hadoop.ipc.Client.call(Client.java:1399)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
        at com.sun.proxy.$Proxy27.registerNodeManager(Unknown Source)
        at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:68)
        at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
        at com.sun.proxy.$Proxy28.registerNodeManager(Unknown Source)
        at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:257)
        at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:191)
        ... 6 more
Caused by: java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
        at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
        at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607)
        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
        at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
        at org.apache.hadoop.ipc.Client.call(Client.java:1438)
        ... 18 more
2015-09-08 08:19:48,257 INFO org.apache.hadoop.service.AbstractService: Service NodeManager failed in state STARTED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.ConnectException: Call From ec2-52-88-167-9.us-west-2.compute.amazonaws.com/172.31.29.154 to 0.0.0.0:8031 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.ConnectException: Call From ec2-52-88-167-9.us-west-2.compute.amazonaws.com/172.31.29.154 to 0.0.0.0:8031 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
        at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:197)
        at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
        at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:264)
        at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:463)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:509)
Caused by: java.net.ConnectException: Call From ec2-52-88-167-9.us-west-2.compute.amazonaws.com/172.31.29.154 to 0.0.0.0:8031 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
        at sun.reflect.GeneratedConstructorAccessor8.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
        at org.apache.hadoop.ipc.Client.call(Client.java:1472)
        at org.apache.hadoop.ipc.Client.call(Client.java:1399)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
        at com.sun.proxy.$Proxy27.registerNodeManager(Unknown Source)
        at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:68)
        at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
        at com.sun.proxy.$Proxy28.registerNodeManager(Unknown Source)
        at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:257)
        at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:191)
        ... 6 more
Caused by: java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
        at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
        at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607)
        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
        at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
        at org.apache.hadoop.ipc.Client.call(Client.java:1438)
        ... 18 more
2015-09-08 08:19:48,263 INFO org.mortbay.log: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:8042
2015-09-08 08:19:48,264 INFO org.apache.hadoop.ipc.Server: Stopping server on 53949
2015-09-08 08:19:48,266 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 53949
2015-09-08 08:19:48,267 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2015-09-08 08:19:48,267 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl is interrupted. Exiting.
2015-09-08 08:19:48,267 INFO org.apache.hadoop.ipc.Server: Stopping server on 8040
2015-09-08 08:19:48,268 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 8040
2015-09-08 08:19:48,268 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2015-09-08 08:19:48,269 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Public cache exiting
2015-09-08 08:19:48,269 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NodeManager metrics system...
2015-09-08 08:19:48,270 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NodeManager metrics system stopped.
2015-09-08 08:19:48,270 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NodeManager metrics system shutdown complete.
2015-09-08 08:19:48,270 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager

core-site.xml:

<configuration>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://ec2-52-26-161-203.us-west-2.compute.amazonaws.com:8020</value>
  </property>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/home/ubuntu/hdfstmp</value>
  </property>
</configuration>

mapred-site.xml:

   <configuration>
      <property>
        <name>mapred.job.tracker</name>
        <value>hdfs://ec2-52-26-161-203.us-west-2.compute.amazonaws.com:8021</value>
      </property>
    </configuration>

hdfs-site.xml:

 <configuration>
    <property>
      <name>dfs.replication</name>
      <value>2</value>
    </property>
    <property>
      <name>dfs.permissions</name>
      <value>false</value>
    </property>
 </configuration>

Master machine:

ubuntu@ec2-52-26-161-203:~$ vim /etc/hosts

172.31.23.167 ec2-52-26-161-203.us-west-2.compute.amazonaws.com

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

ubuntu@ec2-52-26-161-203:~$ vim /etc/hadoop/masters

ec2-52-26-161-203.us-west-2.compute.amazonaws.com

ubuntu@ec2-52-26-161-203:~$ vim /etc/hadoop/slaves

ec2-52-88-167-9.us-west-2.compute.amazonaws.com

Slave Machine :

ubuntu@ec2-52-88-167-9:~ vim /etc/hosts

172.31.29.154 ec2-52-88-167-9.us-west-2.compute.amazonaws.com

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

ubuntu@ec2-52-88-167-9:~ vim /etc/hadoop/slaves

ec2-52-88-167-9.us-west-2.compute.amazonaws.com

ubuntu@ec2-52-26-161-203:~$ sudo netstat -lpten | grep java

tcp        0      0 0.0.0.0:50070           0.0.0.0:*               LISTEN      1000       569904      19910/java      
tcp        0      0 0.0.0.0:50090           0.0.0.0:*               LISTEN      1000       570916      20136/java      
tcp        0      0 172.31.23.167:8020      0.0.0.0:*               LISTEN      1000       569911      19910/java      
tcp6       0      0 :::8088                 :::*                    LISTEN      1000       571699      20278/java      
tcp6       0      0 :::8030                 :::*                    LISTEN      1000       571690      20278/java      
tcp6       0      0 :::8031                 :::*                    LISTEN      1000       571683      20278/java      
tcp6       0      0 :::8032                 :::*                    LISTEN      1000       571695      20278/java      
tcp6       0      0 :::8033                 :::*                    LISTEN      1000       571702      20278/java 

Telnet command:

ubuntu@ec2-52-26-161-203:~$ telnet localhost 8031

Trying ::1...
Connected to localhost.
Escape character is '^]'.

How is it taking 8031 port for resource manager ? I haven't giving in my hadoop configuration files(coresite.xml,mapred-site.xml,hdfs-site.xml) which is above.


Solution

  • I have made modifications in mapred-site.xml and yarn-site.xml which solved my issue.Since I haven't mentioned the host name property value for resource manager in yarn-site.xml it was trying to connect with the address 0.0.0.0 which was the cause for connection refused exception.

    mapred-site.xml

    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    

    yarn-site.xml

    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>ec2-52-26-161-203.us-west-2.compute.amazonaws.com</value>
    </property>