Search code examples
amazon-web-servicesdocker-composedocker-swarmamazon-vpcmongodb-replica-set

Application deployed to Docker Swarm is not connecting with MongoDB Replica Set


I have deployed my application using Docker Swarm with 3 machines.

MongoDB Replica Set is configured manually and its working as a service on Ubuntu machine.

I am trying to connect to my Backend application to MongoDB Replica Set but I am getting context deadline exceeded error. I am using Private-ip to connect since machines are in same AWS VPC. Port 27017 is open in the security group and can be used by VPC network IP.

/etc/hosts is correctly configured on every machine.

I am using Docker-Compose file to deploy the stack.

Replica Set is working fine. I have checked it with manually inserting few documents.

The picture will help readers to understand the context better.

Abbreviations

  • BE = Backend
  • FE = Frontend
  • Mac 1 = Machine 1
  • AZ-1 = Availability Zone 1
  • VPC = Virtual Private Cloud

My Guess:
Is it because Replica-Set in not in the Swarm Network and that's why its unable to connect ???

I am trying to fix this issue for quite sometime now and have not been successful yet. Help is required now.

enter image description here


Solution

  • I found the solution.

    The problem was related to the names of the MongoDB replica instances.

    • The host name for first member is "host" : "10.0.0.223:27017"
    • The host name for second member is "host" : "node2:27017"
    • The host name for third member is "host" : "node3:27017"

    Due to this inconsistency, the backend application was not able to connect to the replica set.

    {
        "_id" : "replica1",
        "version" : 5,
        "term" : 5,
        "protocolVersion" : NumberLong(1),
        "writeConcernMajorityJournalDefault" : true,
        "members" : [
            {
                "_id" : 0,
                "host" : "10.0.0.223:27017",
                "arbiterOnly" : false,
                "buildIndexes" : true,
                "hidden" : false,
                "priority" : 1,
                "tags" : {
                    
                },
                "slaveDelay" : NumberLong(0),
                "votes" : 1
            },
            {
                "_id" : 1,
                "host" : "node2:27017",
                "arbiterOnly" : false,
                "buildIndexes" : true,
                "hidden" : false,
                "priority" : 1,
                "tags" : {
                    
                },
                "slaveDelay" : NumberLong(0),
                "votes" : 1
            },
            {
                "_id" : 2,
                "host" : "node3:27017",
                "arbiterOnly" : false,
                "buildIndexes" : true,
                "hidden" : false,
                "priority" : 1,
                "tags" : {
                    
                },
                "slaveDelay" : NumberLong(0),
                "votes" : 1
            }
        ],
    

    Solution

    To solve this I reconfigured the replica set nodes with private IPs. Node that I have used Private IP for replica set configuration.

    First ssh to primary machine. Login to mongo primary node shell and execute the following commands to change the host entry for 2nd mongo node in JSON.

    > cfg = rs.conf()
    > cfg.members[2].host = "10.0.5.242:27017" (Private IP of second mongodb instance)
    > rs.reconfig(cfg)
    

    Did the same for 3rd mongo entry in JSON.

    Here is the link to change the hostname in Replica Set.
    https://docs.mongodb.com/manual/tutorial/change-hostnames-in-a-replica-set/

    Another Solution:
    If your MongoDB deployment is all fine and everything is set up properly, but still you are unable to connect to the DB, try to Change the Primary DB Instance.

    • ssh to primary machine.
    • Login to mongo primary node shell and execute this commands replica:PRIMARY>rs.stepDown(120). This will make your other instance Primary and your current instance Secondary.

    I experienced this problem and after continuous hit and try for 3 days, this solution worked for me.

    Hope this would help readers to solve similar problem.