I have installed HDFS from universe on my dcos cluster of 10 Core OS machines (3 master nodes, 7 agent nodes). My HA HDFS config has 2 name nodes, 3 journal nodes and 5 data nodes. In the long run, the agent nodes where HDFS is running - all of them, no matter if they are name nodes, journal nodes or data nodes - close their port 22, so I cannot ssh into them, whereas the nodes where HDFS is not running keep their port 22 open. Any idea about why it happens or what I should look for to understand why it happens?
Here are the logs after running nmap on the cluster agent nodes. Nodes 13 and 16 have no HDFS role and keep their 22 port open. All the other nodes having HDFS installed have port 22 closed.
root@svr-17:/home/andreab# nmap svc-10.dc01
Starting Nmap 6.47 ( http://nmap.org ) at 2017-01-29 11:37 GMT
Nmap scan report for svc-10.dc01 (192.168.111.70)
Host is up (0.00061s latency).
Not shown: 997 closed ports
PORT STATE SERVICE
5051/tcp open ida-agent
9000/tcp open cslistener
9003/tcp open unknown
MAC Address: 52:54:00:DE:8E:96 (QEMU Virtual NIC)
Nmap done: 1 IP address (1 host up) scanned in 1.72 seconds
root@svr-17:/home/andreab# nmap svc-11.dc01
Starting Nmap 6.47 ( http://nmap.org ) at 2017-01-29 11:38 GMT
Nmap scan report for svc-11.dc01 (192.168.111.71)
Host is up (0.00084s latency).
Not shown: 996 closed ports
PORT STATE SERVICE
5051/tcp open ida-agent
9001/tcp open tor-orport
9002/tcp open dynamid
9003/tcp open unknown
MAC Address: 52:54:00:31:1A:E9 (QEMU Virtual NIC)
Nmap done: 1 IP address (1 host up) scanned in 1.70 seconds
root@svr-17:/home/andreab# nmap svc-12.dc01
Starting Nmap 6.47 ( http://nmap.org ) at 2017-01-29 11:38 GMT
Nmap scan report for svc-12.dc01 (192.168.111.72)
Host is up (0.00082s latency).
Not shown: 996 closed ports
PORT STATE SERVICE
5051/tcp open ida-agent
9001/tcp open tor-orport
9002/tcp open dynamid
9003/tcp open unknown
MAC Address: 52:54:00:D9:B4:F7 (QEMU Virtual NIC)
Nmap done: 1 IP address (1 host up) scanned in 1.69 seconds
root@svr-17:/home/andreab# nmap svc-13.dc01
Starting Nmap 6.47 ( http://nmap.org ) at 2017-01-29 11:38 GMT
Nmap scan report for svc-13.dc01 (192.168.111.73)
Host is up (0.00025s latency).
Not shown: 998 closed ports
PORT STATE SERVICE
22/tcp open ssh
5051/tcp open ida-agent
MAC Address: 52:54:00:43:96:45 (QEMU Virtual NIC)
Nmap done: 1 IP address (1 host up) scanned in 1.69 seconds
root@svr-17:/home/andreab# nmap svc-14.dc01
Starting Nmap 6.47 ( http://nmap.org ) at 2017-01-29 11:38 GMT
Nmap scan report for svc-14.dc01 (192.168.111.74)
Host is up (0.00029s latency).
Not shown: 998 closed ports
PORT STATE SERVICE
5051/tcp open ida-agent
9003/tcp open unknown
MAC Address: 52:54:00:77:9D:2E (QEMU Virtual NIC)
Nmap done: 1 IP address (1 host up) scanned in 1.70 seconds
root@svr-17:/home/andreab# nmap svc-15.dc01
Starting Nmap 6.47 ( http://nmap.org ) at 2017-01-29 11:39 GMT
Nmap scan report for svc-15.dc01 (192.168.111.75)
Host is up (0.00020s latency).
Not shown: 998 closed ports
PORT STATE SERVICE
5051/tcp open ida-agent
9003/tcp open unknown
MAC Address: 52:54:00:B9:03:FA (QEMU Virtual NIC)
Nmap done: 1 IP address (1 host up) scanned in 23.51 seconds
root@svr-17:/home/andreab# nmap svc-16.dc01
Starting Nmap 6.47 ( http://nmap.org ) at 2017-01-29 11:39 GMT
Nmap scan report for svc-16.dc01 (192.168.111.76)
Host is up (0.00065s latency).
Not shown: 998 closed ports
PORT STATE SERVICE
22/tcp open ssh
5051/tcp open ida-agent
MAC Address: 52:54:00:E8:D6:07 (QEMU Virtual NIC)
Nmap done: 1 IP address (1 host up) scanned in 1.72 seconds
As mentioned in my answer to HDFS resiliency to machine restarts in DC/OS:
problems were found in a buggy version of the universe HDFS package for DC/OS. A completely new HDFS package for DC/OS will be released on Universe in the next few weeks.
https://dcos-community.slack.com/archives/data-services/p1485717889001709
https://dcos-community.slack.com/archives/data-services/p1485801481001734