Search code examples
bindingglusterfsgeo-replication

glusterfs geo-replication - server with two interfaces - private IP advertised


I have been trying to setup a geo replication with glusterfs servers. Everything worked as expected in my test environment, on my staging environment, but then i tried the production and got stuck.

Let say I have

gluster fs server is on public ip 1.1.1.1

gluster fs slave is on public 2.2.2.2, but this IP is on interface eth1 The eth0 on gluster fs slave server is 192.168.0.1.

So when i start the command on 1.1.1.1 (firewall and ssh keys are set properly)

gluster volume geo-replication vol0 2.2.2.2::vol0 create push-pem

I get an error.

Unable to fetch slave volume details. Please check the slave cluster and slave volume. geo-replication command failed

The error is not that important in this case, the problem is the slave IP address

2015-03-16T11:41:08.101229+00:00 xxx kernel: TCP LOGDROP: IN= OUT=eth0 SRC=1.1.1.1 DST=192.168.0.1 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=24243 DF PROTO=TCP SPT=1015 DPT=24007 WINDOW=14600 RES=0x00 SYN URGP=0 

As you can see in the firewall drop log above, the port 24007 of the slave gluster daemon is advertised on private IP of the interface eth0 on slave server and should be the IP of the eth1 private IP. So master cannot connect and will time out

Is there a way to force gluster server to advertise interface eth1 or bind to it only?

I use cfengine and ansible to push configuration, so binding to Interface could be a better solution than IP, but whatever solution will do.

Thank you in advance.


Solution

  • I've encountered this issue but in a different context. I was trying to geo-replicate two nodes which were both behind a NAT (AWS instances in different regions).

    When the master connects to the slave via the public IP to check for volume compatability/size and other details, it retrieves the hostname of the slave, which usually resolves to something that only has meaning in that remote region.

    Then it uses that hostname to dial back to the slave when later setting up the session, which fails, as that hostname resolves to a private IP in a different region.

    My workaround for the issue was to use hostnames when creating the volumes, probing for peers, and establishing geo replication, and then add a /etc/hosts entry mapping slaves hostname which usually resolves to its private IP to its public IP, rather than it's private IP.

    This gets you to the point where you establish a session, but I haven't had any luck actually getting it to sync, as it uses the wrong IP somewhere long the way again.

    Edit:

    I've actually managed to get it running by adding /etc/hosts hacks on both sides.