Search code examples
dockerhadoopdocker-composedocker-swarm

Hadoop. How to avoid workers file due to Docker automatic names


Some tools like Hadoop need to explicitly especify the name of workers (section Slaves File in docs), but when deploys with Docker Swarm it assigns automatic container names, so workers file doesn't work anymore as the names in it don't exist. Is there any way to avoid this file or, at least, assign aliases for containers (independently of container name) to make it work?

Maybe I can't use docker-compose.yml file and I must create the services manually over the cluster... Any kind of light on the subject would be really appreciated


Solution

  • Well, Hadoop documentation sucks... Apparently if you set the alias of master node in the core-site.xml file you can omit the workers file. These are the step I followed:

    1. Customized the core-site.xml file (in my docker-compose.yml file I put my master service with the name nodemaster). This file must be in master and workers nodes:
    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <configuration>
        <configuration>
            <property>
                <name>fs.defaultFS</name>
                <value>hdfs://nodemaster:9000</value>
            </property>
            <property>
                <name>fs.default.name</name>
                <value>hdfs://nodemaster:9000</value>
            </property>
        </configuration>
    </configuration>
    
    1. Now when you run:
    start-dfs.sh
    start-yarn.sh
    

    I'll connect to the master automatically