Search code examples
mariadbubuntu-16.04galera

Galera Cluster for Multi Master: Second node failed to start in Ubuntu 16.04.4 x64


I am trying to test master-master replication between two server by referring to this tutorial: Configure a galera cluster

The first node was started normally after executing galera_new_cluster & get wsrep_cluster_size = 1, but when executed systemctl start mysql for second node, I got an error as listed below.

Job for mariadb.service failed because the control process exited with error code. See "systemctl status mariadb.service" and "journalctl -xe" for details.

Error from "systemctl status mariadb.service":

 Jul 18 08:03:55 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:55 140559072364800 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -110 (Connection timed out)
 Jul 18 08:03:55 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:55 140559072364800 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1458: Failed to open channel 'test_cluster' at 'gcomm://206.189.37.50,206.189.39.149': -110 (Connection timed out)
 Jul 18 08:03:55 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:55 140559072364800 [ERROR] WSREP: gcs connect failed: Connection timed out
 Jul 18 08:03:55 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:55 140559072364800 [ERROR] WSREP: wsrep::connect(gcomm://206.189.37.50,206.189.39.149) failed: 7
 Jul 18 08:03:55 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:55 140559072364800 [ERROR] Aborting
 Jul 18 08:03:56 mariadb-s2 systemd[1]: mariadb.service: Main process exited, code=exited, status=1/FAILURE
 Jul 18 08:03:56 mariadb-s2 systemd[1]: Failed to start MariaDB 10.1.34 database server.
 Jul 18 08:03:56 mariadb-s2 systemd[1]: mariadb.service: Unit entered failed state.
 Jul 18 08:03:56 mariadb-s2 systemd[1]: mariadb.service: Failed with result 'exit-code'.

Error from "journalctl -xe":

Jul 18 08:03:22 mariadb-s2 mysqld[7734]: WSREP: Recovered position 5820583f-8a43-11e8-8c80-9e4cdd04427e:0
Jul 18 08:03:22 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:22 140559072364800 [Note] /usr/sbin/mysqld (mysqld 10.1.34-MariaDB-1~xenial) starting as process 7956 ...
Jul 18 08:03:22 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:22 140559072364800 [Note] WSREP: Read nil XID from storage engines, skipping position init
Jul 18 08:03:22 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:22 140559072364800 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'
Jul 18 08:03:22 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:22 140559072364800 [Note] WSREP: wsrep_load(): Galera 25.3.23(r3789) by Codership Oy <[email protected]> loaded successfully.
Jul 18 08:03:22 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:22 140559072364800 [Note] WSREP: CRC-32C: using hardware acceleration.
Jul 18 08:03:22 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:22 140559072364800 [Note] WSREP: Found saved state: 5820583f-8a43-11e8-8c80-9e4cdd04427e:-1, safe_to_bootstrap: 1 
Jul 18 08:03:22 mariadb-s2 mysqld[7734]: WSREP: Recovered position 5820583f-8a43-11e8-8c80-9e4cdd04427e:0
Jul 18 08:03:22 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:22 140559072364800 [Note] /usr/sbin/mysqld (mysqld 10.1.34-MariaDB-1~xenial) starting as process 7956 ...
Jul 18 08:03:22 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:22 140559072364800 [Note] WSREP: Read nil XID from storage engines, skipping position init
Jul 18 08:03:22 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:22 140559072364800 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'
Jul 18 08:03:22 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:22 140559072364800 [Note] WSREP: wsrep_load(): Galera 25.3.23(r3789) by Codership Oy <[email protected]> loaded successfully.
Jul 18 08:03:22 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:22 140559072364800 [Note] WSREP: CRC-32C: using hardware acceleration.
Jul 18 08:03:22 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:22 140559072364800 [Note] WSREP: Found saved state: 5820583f-8a43-11e8-8c80-9e4cdd04427e:-1, safe_to_bootstrap: 1
Jul 18 08:03:22 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:22 140559072364800 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 206.189.39.149; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evi
Jul 18 08:03:22 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:22 140559072364800 [Note] WSREP: Assign initial position for certification: 0, protocol version: -1
Jul 18 08:03:22 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:22 140559072364800 [Note] WSREP: wsrep_sst_grab()
Jul 18 08:03:22 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:22 140559072364800 [Note] WSREP: Start replication
Jul 18 08:03:22 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:22 140559072364800 [Note] WSREP: Setting initial position to 5820583f-8a43-11e8-8c80-9e4cdd04427e:0
Jul 18 08:03:22 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:22 140559072364800 [Note] WSREP: protonet asio version 0
Jul 18 08:03:22 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:22 140559072364800 [Note] WSREP: Using CRC-32C for message checksums.
Jul 18 08:03:22 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:22 140559072364800 [Note] WSREP: backend: asio
Jul 18 08:03:22 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:22 140559072364800 [Note] WSREP: gcomm thread scheduling priority set to other:0
Jul 18 08:03:22 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:22 140559072364800 [Warning] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)
Jul 18 08:03:22 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:22 140559072364800 [Note] WSREP: restore pc from disk failed
Jul 18 08:03:22 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:22 140559072364800 [Note] WSREP: GMCast version 0
Jul 18 08:03:22 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:22 140559072364800 [Note] WSREP: (0a4e92b4, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
Jul 18 08:03:22 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:22 140559072364800 [Note] WSREP: (0a4e92b4, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
Jul 18 08:03:22 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:22 140559072364800 [Note] WSREP: EVS version 0
Jul 18 08:03:22 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:22 140559072364800 [Note] WSREP: gcomm: connecting to group 'test_cluster', peer '206.189.37.50:,206.189.39.149:'
Jul 18 08:03:22 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:22 140559072364800 [Note] WSREP: (0a4e92b4, 'tcp://0.0.0.0:4567') connection established to 0a4e92b4 tcp://206.189.39.149:4567
Jul 18 08:03:22 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:22 140559072364800 [Warning] WSREP: (0a4e92b4, 'tcp://0.0.0.0:4567') address 'tcp://206.189.39.149:4567' points to own listening address, blacklisting
Jul 18 08:03:25 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:25 140559072364800 [Note] WSREP: (0a4e92b4, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://206.189.37.50:4567 timed out, no messages seen in PT3S
Jul 18 08:03:25 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:25 140559072364800 [Note] WSREP: (0a4e92b4, 'tcp://0.0.0.0:4567') connection to peer 0a4e92b4 with addr tcp://206.189.39.149:4567 timed out, no messages seen in PT3S
Jul 18 08:03:25 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:25 140559072364800 [Warning] WSREP: no nodes coming from prim view, prim not possible
Jul 18 08:03:25 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:25 140559072364800 [Note] WSREP: view(view_id(NON_PRIM,0a4e92b4,1) memb {
Jul 18 08:03:25 mariadb-s2 mysqld[7956]:         0a4e92b4,0
Jul 18 08:03:25 mariadb-s2 mysqld[7956]: } joined {
Jul 18 08:03:25 mariadb-s2 mysqld[7956]: } left {
Jul 18 08:03:25 mariadb-s2 mysqld[7956]: } partitioned {
Jul 18 08:03:25 mariadb-s2 mysqld[7956]: })
Jul 18 08:03:25 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:25 140559072364800 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.50398S), skipping check
Jul 18 08:03:29 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:29 140559072364800 [Note] WSREP: (0a4e92b4, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://206.189.37.50:4567 timed out, no messages seen in PT3S
Jul 18 08:03:34 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:34 140559072364800 [Note] WSREP: (0a4e92b4, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://206.189.37.50:4567 timed out, no messages seen in PT3S
Jul 18 08:03:38 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:38 140559072364800 [Note] WSREP: (0a4e92b4, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://206.189.37.50:4567 timed out, no messages seen in PT3S
Jul 18 08:03:43 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:43 140559072364800 [Note] WSREP: (0a4e92b4, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://206.189.37.50:4567 timed out, no messages seen in PT3S
Jul 18 08:03:47 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:47 140559072364800 [Note] WSREP: (0a4e92b4, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://206.189.37.50:4567 timed out, no messages seen in PT3S
Jul 18 08:03:49 mariadb-s2 kernel: [UFW BLOCK] IN=eth0 OUT= MAC=e2:57:b6:36:df:64:ec:38:73:0c:78:30:08:00 SRC=123.249.27.191 DST=206.189.39.149 LEN=40 TOS=0x00 PREC=0x00 TTL=113 ID=256 PROTO=TCP SPT=39069 DPT=8888 WINDOW=16384 RES=0x00 S
Jul 18 08:03:52 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:52 140559072364800 [Note] WSREP: (0a4e92b4, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://206.189.37.50:4567 timed out, no messages seen in PT3S
Jul 18 08:03:55 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:55 140559072364800 [Note] WSREP: view((empty))
Jul 18 08:03:55 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:55 140559072364800 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
Jul 18 08:03:55 mariadb-s2 mysqld[7956]:          at gcomm/src/pc.cpp:connect():158
Jul 18 08:03:55 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:55 140559072364800 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -110 (Connection timed out)
Jul 18 08:03:55 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:55 140559072364800 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1458: Failed to open channel 'test_cluster' at 'gcomm://206.189.37.50,206.189.39.149': -110 (Connection timed out)
Jul 18 08:03:55 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:55 140559072364800 [ERROR] WSREP: gcs connect failed: Connection timed out
Jul 18 08:03:55 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:55 140559072364800 [ERROR] WSREP: wsrep::connect(gcomm://206.189.37.50,206.189.39.149) failed: 7
Jul 18 08:03:55 mariadb-s2 mysqld[7956]: 2018-07-18  8:03:55 140559072364800 [ERROR] Aborting
Jul 18 08:03:56 mariadb-s2 systemd[1]: mariadb.service: Main process exited, code=exited, status=1/FAILURE
Jul 18 08:03:56 mariadb-s2 systemd[1]: Failed to start MariaDB 10.1.34 database server.

Configuration of Node 1:

[mysqld]
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
bind-address=0.0.0.0

# Galera Provider Configuration
wsrep_on=ON
wsrep_provider=/usr/lib/galera/libgalera_smm.so

# Galera Cluster Configuration
wsrep_cluster_name="test_cluster"
wsrep_cluster_address="gcomm://206.189.37.50,206.189.39.149"

# Galera Synchronization Configuration
wsrep_sst_method=rsync

# Galera Node Configuration
wsrep_node_address="206.189.37.50"
wsrep_node_name="node1"

Configuration of Node 2:

[mysqld]
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
bind-address=0.0.0.0

# Galera Provider Configuration
wsrep_on=ON
wsrep_provider=/usr/lib/galera/libgalera_smm.so

# Galera Cluster Configuration
wsrep_cluster_name="test_cluster"
wsrep_cluster_address="gcomm://206.189.37.50,206.189.39.149"

# Galera Synchronization Configuration
wsrep_sst_method=rsync

# Galera Node Configuration
wsrep_node_address="206.189.39.149"
wsrep_node_name="node2"

I had referred to many similar question in Stack Overflow, but none of the suggestions help. I still getting the same error.


Solution

  • The logs show that the Ubuntu Firewall (UFW) is active so it’s likely that the connections get blocked by the firewall.

    To see if this is the problem, you can run ufw disable. If replication works with ufw disabled, make sure you add rules to allow Connections from node1’s ip to tcp port 4567 on node2 and node2’s ip to tcp port 4567 on node1.

    After you’ve added the rules check the output of ufw list and if everything is fine, run ufw enable to reactivate the firewall.

    You can run man ufw to learn more about UFW configuration.