Search code examples
hadoopsshhdfscoreosdcos

HDFS on DC/OS closes port 22. Cannot SSH on any of the HDFS nodes


I have installed HDFS from universe on my dcos cluster of 10 Core OS machines (3 master nodes, 7 agent nodes). My HA HDFS config has 2 name nodes, 3 journal nodes and 5 data nodes. In the long run, the agent nodes where HDFS is running - all of them, no matter if they are name nodes, journal nodes or data nodes - close their port 22, so I cannot ssh into them, whereas the nodes where HDFS is not running keep their port 22 open. Any idea about why it happens or what I should look for to understand why it happens?

Here are the logs after running nmap on the cluster agent nodes. Nodes 13 and 16 have no HDFS role and keep their 22 port open. All the other nodes having HDFS installed have port 22 closed.

root@svr-17:/home/andreab# nmap svc-10.dc01

Starting Nmap 6.47 ( http://nmap.org ) at 2017-01-29 11:37 GMT
Nmap scan report for svc-10.dc01 (192.168.111.70)
Host is up (0.00061s latency).
Not shown: 997 closed ports
PORT     STATE SERVICE
5051/tcp open  ida-agent
9000/tcp open  cslistener
9003/tcp open  unknown
MAC Address: 52:54:00:DE:8E:96 (QEMU Virtual NIC)

Nmap done: 1 IP address (1 host up) scanned in 1.72 seconds



root@svr-17:/home/andreab# nmap svc-11.dc01

Starting Nmap 6.47 ( http://nmap.org ) at 2017-01-29 11:38 GMT
Nmap scan report for svc-11.dc01 (192.168.111.71)
Host is up (0.00084s latency).
Not shown: 996 closed ports
PORT     STATE SERVICE
5051/tcp open  ida-agent
9001/tcp open  tor-orport
9002/tcp open  dynamid
9003/tcp open  unknown
MAC Address: 52:54:00:31:1A:E9 (QEMU Virtual NIC)

Nmap done: 1 IP address (1 host up) scanned in 1.70 seconds



root@svr-17:/home/andreab# nmap svc-12.dc01

Starting Nmap 6.47 ( http://nmap.org ) at 2017-01-29 11:38 GMT
Nmap scan report for svc-12.dc01 (192.168.111.72)
Host is up (0.00082s latency).
Not shown: 996 closed ports
PORT     STATE SERVICE
5051/tcp open  ida-agent
9001/tcp open  tor-orport
9002/tcp open  dynamid
9003/tcp open  unknown
MAC Address: 52:54:00:D9:B4:F7 (QEMU Virtual NIC)

Nmap done: 1 IP address (1 host up) scanned in 1.69 seconds



root@svr-17:/home/andreab# nmap svc-13.dc01

Starting Nmap 6.47 ( http://nmap.org ) at 2017-01-29 11:38 GMT
Nmap scan report for svc-13.dc01 (192.168.111.73)
Host is up (0.00025s latency).
Not shown: 998 closed ports
PORT     STATE SERVICE
22/tcp   open  ssh
5051/tcp open  ida-agent
MAC Address: 52:54:00:43:96:45 (QEMU Virtual NIC)

Nmap done: 1 IP address (1 host up) scanned in 1.69 seconds



root@svr-17:/home/andreab# nmap svc-14.dc01

Starting Nmap 6.47 ( http://nmap.org ) at 2017-01-29 11:38 GMT
Nmap scan report for svc-14.dc01 (192.168.111.74)
Host is up (0.00029s latency).
Not shown: 998 closed ports
PORT     STATE SERVICE
5051/tcp open  ida-agent
9003/tcp open  unknown
MAC Address: 52:54:00:77:9D:2E (QEMU Virtual NIC)

Nmap done: 1 IP address (1 host up) scanned in 1.70 seconds



root@svr-17:/home/andreab# nmap svc-15.dc01

Starting Nmap 6.47 ( http://nmap.org ) at 2017-01-29 11:39 GMT
Nmap scan report for svc-15.dc01 (192.168.111.75)
Host is up (0.00020s latency).
Not shown: 998 closed ports
PORT     STATE SERVICE
5051/tcp open  ida-agent
9003/tcp open  unknown
MAC Address: 52:54:00:B9:03:FA (QEMU Virtual NIC)

Nmap done: 1 IP address (1 host up) scanned in 23.51 seconds



root@svr-17:/home/andreab# nmap svc-16.dc01

Starting Nmap 6.47 ( http://nmap.org ) at 2017-01-29 11:39 GMT
Nmap scan report for svc-16.dc01 (192.168.111.76)
Host is up (0.00065s latency).
Not shown: 998 closed ports
PORT     STATE SERVICE
22/tcp   open  ssh
5051/tcp open  ida-agent
MAC Address: 52:54:00:E8:D6:07 (QEMU Virtual NIC)

Nmap done: 1 IP address (1 host up) scanned in 1.72 seconds


Solution

  • As mentioned in my answer to HDFS resiliency to machine restarts in DC/OS:

    problems were found in a buggy version of the universe HDFS package for DC/OS. A completely new HDFS package for DC/OS will be released on Universe in the next few weeks.

    https://dcos-community.slack.com/archives/data-services/p1485717889001709

    https://dcos-community.slack.com/archives/data-services/p1485801481001734