Search code examples
dockerhazelcastmesos

Hazelcast TCP/IP discovery on DCOS/ Marathon and docker


I have a dockerized dropwizard service deployed on marathon. I am using Hazelcast as a distributed cache which I start has part of my dropwizard service. I have placed a constraint to ensure that each container is started on a unique Host.

   "constraints": [
        [
            "hostname",
            "UNIQUE"
        ]
    ],

I have exposed 2 ports on my docker container, 10012 for my service and 10013 for Hazelcast. I am using Zookeeper for my Dropwizard service discovery. Thus when I startup my Hazelcast instance I have access to the hostnames of all the machines on which my docker containers are running and I add all of them as below.

TcpIpConfig tcpIpConfig = join.getTcpIpConfig();
// finder is a handle to a service discovery service and the following gets me all the hosts on which my docker containers will run.
List<ServiceNode<ShardInfo>> nodes = finder.getAllNodes();
nodes.stream()
     .peek(serviceNode -> log.info("Adding " + serviceNode.representation() + " to hazelcast."))
     .map(serviceNode -> serviceNode.getHost())
     .forEach(host -> tcpIpConfig.addMember(host));
tcpIpConfig.setRequiredMember(null).setEnabled(true);

Now issues: If I use network type as BRIDGE while deploying on Marathon, then I don't know the docker container host and thus my 2 docker containers don't know each other. It looks something like this:

ip-10-200-2-219.ap-southeast-1.compute.internal (docker host) - 172.12.1.18 (docker container ip)

ip-10-200-2-220.ap-southeast-1.compute.internal (docker host) - 172.12.1.20 (docker container ip)

From zookeeper I get the docker host IPs but not the docker container IPs.

If I use network type as HOST then everything works but an issue is that I then have to make sure that ports which my docker containers are running always have port 1001 and 10013 available. (With BRIDGE the docker container ports are bound to a random ports).


Solution

  • Analysis:

    The two docker containers are inside their own network localized to the slaves. They need to recognise each other using the public IP of the slave and the bridged port to which 5701 (or whatever hazelcast port you are using).

    Solution

    In the TCP/IP configuration set the Public Address and Port when starting the instance. All instances will do this and they will talk to each other using the marathon slave IP and the randomized port used to it.

    Use the HOST and PORT_5701 variables provided by marathon and available inside the container to do this.

    Config hzConfig = new Config();
    hzConfig.getNetworkConfig().setPublicAddress(
                                 String.format("%s:%s", 
                                     System.getenv("HOST"),    
                                     System.getenv("PORT_5701")));
    

    Refer to hazelcast network config documentation to understand a bit more about the public address option.