环境概述

此文将搭建一个两节点的fabric HA集群
环境如下:
主机: centos6.5
MySQL: 5.7.18 多实例

角色 IP port 备注
Fabric localhost 3306 管理节点
node1 localhost 3307 HA成员
node1 localhost 3308 HA成员

安装Fabric

Fabric现在已经合并到 utilities 中了,但是,下载注意官方的提示:
MySQL Fabric is included in MySQL Utilities versions prior to 1.6.2.
1.6.2之前它还在 utilities 里,但是目前最新的GA版本号是1.6.5,并不在里面,我们需要下上一个GA。链接在此:
https://cdn.mysql.com//Downloads/MySQLGUITools/mysql-utilities-1.5.6-1.el6.noarch.rpm
rpm 一下就可以用了。

MySQL配置

重点设置下面四个参数

1
2
3
4
log_bin
gtid-mode=ON
enforce-gtid-consistency
log_slave_updates

Fabric配置

创建账号

每个节点都要创建

1
2
3
create user 'fabric'@'%' identified by '123';
grant all  on *.* to 'fabric'@'%' ;           
flush privileges;

修改MySQL Fabric 配置文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
vim /etc/mysql/fabric.cfg
[storage]
address = 192.168.1.100:3306
user = fabric
password = pass
database = fabric
auth_plugin = mysql_native_password
connection_timeout = 6
connection_attempts = 6
connection_delay = 1

[servers]
user = fabric
password = pass
unreachable_timeout = 5

HA初始化

1
2
3
4
5
6
7
8
9
[root@localhost ~]# mysqlfabric  manage setup 
[INFO] 1493595918.431773 - MainThread - Initializing persister: user (fabric), server (192.168.1.201:3306), database (fabric).
Finishing initial setup
=======================
Password for admin user is not yet set.
Password for admin/xmlrpc: 
Repeat Password: 
Password set.
Password set.

创建HA集群

创建group

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[root@localhost ~]# mysqlfabric group create mysql_ha #创建一个名为mysql_ha的集群
Password for admin: 
Fabric UUID:  5ca1ab1e-a007-feed-f00d-cab3fe13249e
Time-To-Live: 1

                                uuid finished success result
------------------------------------ -------- ------- ------
1f8d4b07-6d84-48be-9178-da695397af3c        1       1      1

state success          when                                                   description
----- ------- ------------- -------------------------------------------------------------
    3       2    1.4936e+09 Triggered by <mysql.fabric.events.Event object at 0x10ad310>.
    4       2    1.4936e+09                             Executing action (_create_group).
    5       2    1.4936e+09                              Executed action (_create_group).

添加成员到group

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
[root@localhost ~]# mysqlfabric group add mysql_ha 192.168.1.201:3307 
Password for admin: 
Fabric UUID:  5ca1ab1e-a007-feed-f00d-cab3fe13249e
Time-To-Live: 1

                                uuid finished success result
------------------------------------ -------- ------- ------
87ea2237-9c81-4cdb-82d4-e88c2028046f        1       1      1

state success          when                                                   description
----- ------- ------------- -------------------------------------------------------------
    3       2    1.4936e+09 Triggered by <mysql.fabric.events.Event object at 0x10ad6d0>.
    4       2    1.4936e+09                               Executing action (_add_server).
    5       2    1.4936e+09                                Executed action (_add_server).


[root@localhost ~]# mysqlfabric group add mysql_ha 192.168.1.201:3308
Password for admin: 
Fabric UUID:  5ca1ab1e-a007-feed-f00d-cab3fe13249e
Time-To-Live: 1

                                uuid finished success result
------------------------------------ -------- ------- ------
91bff57f-bc1d-4749-89a1-95183f932900        1       1      1

state success          when                                                   description
----- ------- ------------- -------------------------------------------------------------
    3       2    1.4936e+09 Triggered by <mysql.fabric.events.Event object at 0x10ad6d0>.
    4       2    1.4936e+09                               Executing action (_add_server).
    5       2    1.4936e+09                                Executed action (_add_server).

提升一个主

有两种方式提升一个成员为主,一是让fabric自己选择,而是手动指定。

  • fabric自选主
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
[root@localhost ~]# mysqlfabric group promote mysql_ha
Password for admin: 
Fabric UUID:  5ca1ab1e-a007-feed-f00d-cab3fe13249e
Time-To-Live: 1

                                uuid finished success result
------------------------------------ -------- ------- ------
ae00bc31-e27f-42d9-a656-bad4c0cd2921        1       1      1

state success          when                                                   description
----- ------- ------------- -------------------------------------------------------------
    3       2    1.4936e+09  Triggered by <mysql.fabric.events.Event object at 0xec4690>.
    4       2    1.4936e+09                      Executing action (_define_ha_operation).
    5       2    1.4936e+09                       Executed action (_define_ha_operation).
    3       2    1.4936e+09 Triggered by <mysql.fabric.events.Event object at 0x101ae90>.
    4       2    1.4936e+09                      Executing action (_find_candidate_fail).
    5       2    1.4936e+09                       Executed action (_find_candidate_fail).
    3       2    1.4936e+09 Triggered by <mysql.fabric.events.Event object at 0x101ac10>.
    4       2    1.4936e+09                     Executing action (_check_candidate_fail).
    5       2    1.4936e+09                      Executed action (_check_candidate_fail).
    3       2    1.4936e+09 Triggered by <mysql.fabric.events.Event object at 0x101ab50>.
    4       2    1.4936e+09                          Executing action (_wait_slave_fail).
    5       2    1.4936e+09                           Executed action (_wait_slave_fail).
    3       2    1.4936e+09 Triggered by <mysql.fabric.events.Event object at 0x1028110>.
    4       2    1.4936e+09                      Executing action (_change_to_candidate).
    5       2    1.4936e+09                       Executed action (_change_to_candidate).
  • 手动指定主
1
 mysqlfabric group promote <group_name> --slave_id='<node_uuid>'

HA健康检查

两种方式,一种是UUID显示方式:

1
2
3
4
5
6
7
8
9
 [root@localhost data3307]# mysqlfabric  group health mysql_ha 
Password for admin: 
Fabric UUID:  5ca1ab1e-a007-feed-f00d-cab3fe13249e
Time-To-Live: 1

                                uuid is_alive    status is_not_running is_not_configured io_not_running sql_not_running io_error sql_error
------------------------------------ -------- --------- -------------- ----------------- -------------- --------------- -------- ---------
f0c7d668-2a7c-11e7-b3c3-000c29217b03        1   PRIMARY              0                 0              0               0    False     False
fe6d0231-2a7c-11e7-b65e-000c29217b03        1 SECONDARY              0                 0              0               0    False     False

另一种是IP显示方式:

1
2
3
4
5
6
7
8
9
[root@localhost data3307]# mysqlfabric  group lookup_servers mysql_ha
Password for admin: 
Fabric UUID:  5ca1ab1e-a007-feed-f00d-cab3fe13249e
Time-To-Live: 1

                         server_uuid            address    status       mode weight
------------------------------------ ------------------ --------- ---------- ------
f0c7d668-2a7c-11e7-b3c3-000c29217b03 192.168.1.201:3307   PRIMARY READ_WRITE    1.0
fe6d0231-2a7c-11e7-b65e-000c29217b03 192.168.1.201:3308 SECONDARY  READ_ONLY    1.0

自动故障转移

激活故障探测

整个集群已经搭建完毕,接下来我们要让集群具有发现问题的能力,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[root@localhost data3307]# mysqlfabric group activate mysql_ha
Password for admin: 
Fabric UUID:  5ca1ab1e-a007-feed-f00d-cab3fe13249e
Time-To-Live: 1

                                uuid finished success result
------------------------------------ -------- ------- ------
39d80744-c811-4e24-a2c8-2b8a5a9005fa        1       1      1

state success          when                                                   description
----- ------- ------------- -------------------------------------------------------------
    3       2   1.49365e+09 Triggered by <mysql.fabric.events.Event object at 0x101aa90>.
    4       2   1.49365e+09                           Executing action (_activate_group).
    5       2   1.49365e+09                            Executed action (_activate_group).

模拟故障

现在 master=3307 slave=3308,我们手动关闭3307,看HA故障转移能力

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[root@localhost data3307]# mysqld_multi stop 3307
[root@localhost data3307]# mysqld_multi report
Reporting MySQL servers
MySQL server from group: mysqld3306 is running
MySQL server from group: mysqld3307 is not running #3307已经stop
MySQL server from group: mysqld3308 is running
MySQL server from group: mysqld3309 is running  
[root@localhost data3307]# mysqlfabric group lookup_servers mysql_ha 
Password for admin: 
Fabric UUID:  5ca1ab1e-a007-feed-f00d-cab3fe13249e
Time-To-Live: 1

                         server_uuid            address  status       mode weight
------------------------------------ ------------------ ------- ---------- ------
f0c7d668-2a7c-11e7-b3c3-000c29217b03 192.168.1.201:3307  FAULTY READ_WRITE    1.0 #3307已经stop
fe6d0231-2a7c-11e7-b65e-000c29217b03 192.168.1.201:3308 PRIMARY READ_WRITE    1.0

可以看到master已经转移到了3308 。But,当我们重启3307之后,3307并不能自动加入集群,需要我们手动做些工作。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
[root@localhost data3307]# mysqlfabric server set_status f0c7d668-2a7c-11e7-b3c3-000c29217b03 spare
Password for admin: 
Fabric UUID:  5ca1ab1e-a007-feed-f00d-cab3fe13249e
Time-To-Live: 1

                                uuid finished success result
------------------------------------ -------- ------- ------
3a2ac6b2-5a79-4ab8-9d3d-a197332ee711        1       1      1

state success          when                                                   description
----- ------- ------------- -------------------------------------------------------------
    3       2   1.49365e+09 Triggered by <mysql.fabric.events.Event object at 0x10ada50>.
    4       2   1.49365e+09                        Executing action (_set_server_status).
    5       2   1.49365e+09                         Executed action (_set_server_status).


[root@localhost data3307]# mysqlfabric server set_status f0c7d668-2a7c-11e7-b3c3-000c29217b03 secondary
Password for admin: 
Fabric UUID:  5ca1ab1e-a007-feed-f00d-cab3fe13249e
Time-To-Live: 1

                                uuid finished success result
------------------------------------ -------- ------- ------
8fa3a763-aa48-489d-95a4-52f5323d0bbc        1       1      1

state success          when                                                   description
----- ------- ------------- -------------------------------------------------------------
    3       2   1.49365e+09 Triggered by <mysql.fabric.events.Event object at 0x10ada50>.
    4       2   1.49365e+09                        Executing action (_set_server_status).
    5       2   1.49365e+09                         Executed action (_set_server_status).

接着查看集群状态:

1
2
3
4
5
6
7
8
9
[root@localhost data3307]# mysqlfabric group lookup_servers mysql_ha
Password for admin: 
Fabric UUID:  5ca1ab1e-a007-feed-f00d-cab3fe13249e
Time-To-Live: 1

                         server_uuid            address    status       mode weight
------------------------------------ ------------------ --------- ---------- ------
f0c7d668-2a7c-11e7-b3c3-000c29217b03 192.168.1.201:3307 SECONDARY  READ_ONLY    1.0
fe6d0231-2a7c-11e7-b65e-000c29217b03 192.168.1.201:3308   PRIMARY READ_WRITE    1.0

至此,3307已经重新加入ha,成为slave。以此类推,将其他节点下线,再上线,也能达到同样效果,不再演示。至此,我们模拟了两个节点分别停机,上线,fabric的自动故障转移功能。

配置过程的报错

问题1

Q:在提升主的时候,由于要选择的主此前purge掉了一些binlog
报错: 

1
2
3
 Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting \
using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs\
containing GTIDs that the slave requires.'

A:手动将slave指向master

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
master:
    mysql> show master status\G;
*************************** 1. row ***************************
             File: mysql-bin.000025
         Position: 393
     Binlog_Do_DB: 
 Binlog_Ignore_DB: 
Executed_Gtid_Set: d54fe35a-2a7c-11e7-b24b-000c29217b03:1-73,
f0c7d668-2a7c-11e7-b3c3-000c29217b03:1-16
1 row in set (0.00 sec)

slave:
mysql> stop slave ;
Query OK, 0 rows affected (0.01 sec)

mysql> reset slave ;
Query OK, 0 rows affected (0.09 sec)

mysql> reset master;
Query OK, 0 rows affected (0.02 sec)

mysql> set global gtid_purged="d54fe35a-2a7c-11e7-b24b-000c29217b03:1-73,f0c7d668-2a7c-11e7-b3c3-000c29217b03:1-16";
Query OK, 0 rows affected (0.00 sec)

mysql>  change master to master_host='localhost',master_port=3307,master_user='fabric',master_password='123',master_auto_position=1;    
Query OK, 0 rows affected, 2 warnings (0.05 sec)

mysql> start slave;
Query OK, 0 rows affected (0.02 sec)

问题2

Q:在上个问题的处理过程中,曾试图跳过部分事务,所以设置了GTID_NEXT参数,然后就出现如下报错

1
2
3
4
5
6
7
mysql> stop slave ;
ERROR 1837 (HY000): When @@SESSION.GTID_NEXT is set to a GTID, you must explicitly set it to a different value after a COMMIT or ROLLBACK. Please check GTID_NEXT variable manual page for detailed explanation. Current @@SESSION.GTID_NEXT is 'f0c7d668-2a7c-11e7-b3c3-000c29217b03:11'.
mysql> start slave ;
ERROR 1837 (HY000): When @@SESSION.GTID_NEXT is set to a GTID, you must explicitly set it to a different value after a COMMIT or ROLLBACK. Please check GTID_NEXT variable manual page for detailed explanation. Current @@SESSION.GTID_NEXT is 'f0c7d668-2a7c-11e7-b3c3-000c29217b03:11'.

mysql> reset slave ;
ERROR 1837 (HY000): When @@SESSION.GTID_NEXT is set to a GTID, you must explicitly set it to a different value after a COMMIT or ROLLBACK. Please check GTID_NEXT variable manual page for detailed explanation. Current @@SESSION.GTID_NEXT is 'f0c7d668-2a7c-11e7-b3c3-000c29217b03:11'.

A:

1
2
mysql> set gtid_next='automatic';
Query OK, 0 rows affected (0.01 sec)

问题3

Q:当3307下线,从新上线后,设置spare的时候,报错

1
  Last_SQL_Error: Slave failed to initialize relay log info structure from the repository

**  A:**
  需要在3307

1
reset slave all;