环境概述
此文将搭建一个两节点的fabric HA集群 环境如下: 主机: centos6.5 MySQL: 5.7.18 多实例
角色
IP
port
备注
Fabric
localhost
3306
管理节点
node1
localhost
3307
HA成员
node1
localhost
3308
HA成员
安装Fabric Fabric现在已经合并到 utilities 中了,但是,下载注意官方的提示:MySQL Fabric is included in MySQL Utilities versions prior to 1.6.2. 1.6.2之前它还在 utilities 里,但是目前最新的GA版本号是1.6.5,并不在里面,我们需要下上一个GA。链接在此:https://cdn.mysql.com//Downloads/MySQLGUITools/mysql-utilities-1.5.6-1.el6.noarch.rpm rpm 一下就可以用了。
MySQL配置 重点设置下面四个参数
1 2 3 4 log_bin gtid-mode=ON enforce-gtid-consistency log_slave_updates
Fabric配置 创建账号 每个节点都要创建
1 2 3 create user 'fabric' @'%' identified by '123' ; grant all on *.* to 'fabric' @'%' ; flush privileges;
修改MySQL Fabric 配置文件 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 vim /etc/mysql/fabric.cfg [storage] address = 192.168.1.100:3306 user = fabric password = pass database = fabric auth_plugin = mysql_native_password connection_timeout = 6 connection_attempts = 6 connection_delay = 1 [servers] user = fabric password = pass unreachable_timeout = 5
HA初始化 1 2 3 4 5 6 7 8 9 [root@localhost ~] [INFO] 1493595918.431773 - MainThread - Initializing persister: user (fabric), server (192.168.1.201:3306), database (fabric). Finishing initial setup ======================= Password for admin user is not yet set . Password for admin/xmlrpc: Repeat Password: Password set . Password set .
创建HA集群 创建group 1 2 3 4 5 6 7 8 9 10 11 12 13 14 [root@localhost ~] Password for admin: Fabric UUID: 5ca1ab1e-a007-feed-f00d-cab3fe13249e Time-To-Live: 1 uuid finished success result ------------------------------------ -------- ------- ------ 1f8d4b07-6d84-48be-9178-da695397af3c 1 1 1 state success when description ----- ------- ------------- ------------------------------------------------------------- 3 2 1.4936e+09 Triggered by <mysql.fabric.events.Event object at 0x10ad310>. 4 2 1.4936e+09 Executing action (_create_group). 5 2 1.4936e+09 Executed action (_create_group).
添加成员到group 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 [root@localhost ~] Password for admin: Fabric UUID: 5ca1ab1e-a007-feed-f00d-cab3fe13249e Time-To-Live: 1 uuid finished success result ------------------------------------ -------- ------- ------ 87ea2237-9c81-4cdb-82d4-e88c2028046f 1 1 1 state success when description ----- ------- ------------- ------------------------------------------------------------- 3 2 1.4936e+09 Triggered by <mysql.fabric.events.Event object at 0x10ad6d0>. 4 2 1.4936e+09 Executing action (_add_server). 5 2 1.4936e+09 Executed action (_add_server). [root@localhost ~] Password for admin: Fabric UUID: 5ca1ab1e-a007-feed-f00d-cab3fe13249e Time-To-Live: 1 uuid finished success result ------------------------------------ -------- ------- ------ 91bff57f-bc1d-4749-89a1-95183f932900 1 1 1 state success when description ----- ------- ------------- ------------------------------------------------------------- 3 2 1.4936e+09 Triggered by <mysql.fabric.events.Event object at 0x10ad6d0>. 4 2 1.4936e+09 Executing action (_add_server). 5 2 1.4936e+09 Executed action (_add_server).
提升一个主 有两种方式提升一个成员为主,一是让fabric自己选择,而是手动指定。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 [root@localhost ~] Password for admin: Fabric UUID: 5ca1ab1e-a007-feed-f00d-cab3fe13249e Time-To-Live: 1 uuid finished success result ------------------------------------ -------- ------- ------ ae00bc31-e27f-42d9-a656-bad4c0cd2921 1 1 1 state success when description ----- ------- ------------- ------------------------------------------------------------- 3 2 1.4936e+09 Triggered by <mysql.fabric.events.Event object at 0xec4690>. 4 2 1.4936e+09 Executing action (_define_ha_operation). 5 2 1.4936e+09 Executed action (_define_ha_operation). 3 2 1.4936e+09 Triggered by <mysql.fabric.events.Event object at 0x101ae90>. 4 2 1.4936e+09 Executing action (_find_candidate_fail). 5 2 1.4936e+09 Executed action (_find_candidate_fail). 3 2 1.4936e+09 Triggered by <mysql.fabric.events.Event object at 0x101ac10>. 4 2 1.4936e+09 Executing action (_check_candidate_fail). 5 2 1.4936e+09 Executed action (_check_candidate_fail). 3 2 1.4936e+09 Triggered by <mysql.fabric.events.Event object at 0x101ab50>. 4 2 1.4936e+09 Executing action (_wait_slave_fail). 5 2 1.4936e+09 Executed action (_wait_slave_fail). 3 2 1.4936e+09 Triggered by <mysql.fabric.events.Event object at 0x1028110>. 4 2 1.4936e+09 Executing action (_change_to_candidate). 5 2 1.4936e+09 Executed action (_change_to_candidate).
1 mysqlfabric group promote <group_name> --slave_id='<node_uuid>'
HA健康检查 两种方式,一种是UUID显示方式:
1 2 3 4 5 6 7 8 9 [root@localhost data3307] Password for admin: Fabric UUID: 5ca1ab1e-a007-feed-f00d-cab3fe13249e Time-To-Live: 1 uuid is_alive status is_not_running is_not_configured io_not_running sql_not_running io_error sql_error ------------------------------------ -------- --------- -------------- ----------------- -------------- --------------- -------- --------- f0c7d668-2a7c-11e7-b3c3-000c29217b03 1 PRIMARY 0 0 0 0 False False fe6d0231-2a7c-11e7-b65e-000c29217b03 1 SECONDARY 0 0 0 0 False False
另一种是IP显示方式:
1 2 3 4 5 6 7 8 9 [root@localhost data3307] Password for admin: Fabric UUID: 5ca1ab1e-a007-feed-f00d-cab3fe13249e Time-To-Live: 1 server_uuid address status mode weight ------------------------------------ ------------------ --------- ---------- ------ f0c7d668-2a7c-11e7-b3c3-000c29217b03 192.168.1.201:3307 PRIMARY READ_WRITE 1.0 fe6d0231-2a7c-11e7-b65e-000c29217b03 192.168.1.201:3308 SECONDARY READ_ONLY 1.0
自动故障转移 激活故障探测 整个集群已经搭建完毕,接下来我们要让集群具有发现问题的能力,
1 2 3 4 5 6 7 8 9 10 11 12 13 14 [root@localhost data3307] Password for admin: Fabric UUID: 5ca1ab1e-a007-feed-f00d-cab3fe13249e Time-To-Live: 1 uuid finished success result ------------------------------------ -------- ------- ------ 39d80744-c811-4e24-a2c8-2b8a5a9005fa 1 1 1 state success when description ----- ------- ------------- ------------------------------------------------------------- 3 2 1.49365e+09 Triggered by <mysql.fabric.events.Event object at 0x101aa90>. 4 2 1.49365e+09 Executing action (_activate_group). 5 2 1.49365e+09 Executed action (_activate_group).
模拟故障 现在 master=3307 slave=3308,我们手动关闭3307,看HA故障转移能力
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 [root@localhost data3307] [root@localhost data3307] Reporting MySQL servers MySQL server from group: mysqld3306 is running MySQL server from group: mysqld3307 is not running MySQL server from group: mysqld3308 is running MySQL server from group: mysqld3309 is running [root@localhost data3307] Password for admin: Fabric UUID: 5ca1ab1e-a007-feed-f00d-cab3fe13249e Time-To-Live: 1 server_uuid address status mode weight ------------------------------------ ------------------ ------- ---------- ------ f0c7d668-2a7c-11e7-b3c3-000c29217b03 192.168.1.201:3307 FAULTY READ_WRITE 1.0 fe6d0231-2a7c-11e7-b65e-000c29217b03 192.168.1.201:3308 PRIMARY READ_WRITE 1.0
可以看到master已经转移到了3308 。But,当我们重启3307之后,3307并不能自动加入集群,需要我们手动做些工作。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 [root@localhost data3307] Password for admin: Fabric UUID: 5ca1ab1e-a007-feed-f00d-cab3fe13249e Time-To-Live: 1 uuid finished success result ------------------------------------ -------- ------- ------ 3a2ac6b2-5a79-4ab8-9d3d-a197332ee711 1 1 1 state success when description ----- ------- ------------- ------------------------------------------------------------- 3 2 1.49365e+09 Triggered by <mysql.fabric.events.Event object at 0x10ada50>. 4 2 1.49365e+09 Executing action (_set_server_status). 5 2 1.49365e+09 Executed action (_set_server_status). [root@localhost data3307] Password for admin: Fabric UUID: 5ca1ab1e-a007-feed-f00d-cab3fe13249e Time-To-Live: 1 uuid finished success result ------------------------------------ -------- ------- ------ 8fa3a763-aa48-489d-95a4-52f5323d0bbc 1 1 1 state success when description ----- ------- ------------- ------------------------------------------------------------- 3 2 1.49365e+09 Triggered by <mysql.fabric.events.Event object at 0x10ada50>. 4 2 1.49365e+09 Executing action (_set_server_status). 5 2 1.49365e+09 Executed action (_set_server_status).
接着查看集群状态:
1 2 3 4 5 6 7 8 9 [root@localhost data3307] Password for admin: Fabric UUID: 5ca1ab1e-a007-feed-f00d-cab3fe13249e Time-To-Live: 1 server_uuid address status mode weight ------------------------------------ ------------------ --------- ---------- ------ f0c7d668-2a7c-11e7-b3c3-000c29217b03 192.168.1.201:3307 SECONDARY READ_ONLY 1.0 fe6d0231-2a7c-11e7-b65e-000c29217b03 192.168.1.201:3308 PRIMARY READ_WRITE 1.0
至此,3307已经重新加入ha,成为slave。以此类推,将其他节点下线,再上线,也能达到同样效果,不再演示。至此,我们模拟了两个节点分别停机,上线,fabric的自动故障转移功能。
配置过程的报错 问题1 Q:在提升主的时候,由于要选择的主此前purge掉了一些binlog 报错:
1 2 3 Got fatal error 1236 from master when reading data from binary log : 'The slave is connecting \ using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs\ containing GTIDs that the slave requires.'
A:手动将slave指向master
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 master: mysql> show master status\G; *************************** 1. row *************************** File: mysql-bin.000025 Position: 393 Binlog_Do_DB: Binlog_Ignore_DB: Executed_Gtid_Set: d54fe35a-2a7c-11e7-b24b-000c29217b03:1-73, f0c7d668-2a7c-11e7-b3c3-000c29217b03:1-16 1 row in set (0.00 sec) slave: mysql> stop slave ; Query OK, 0 rows affected (0.01 sec) mysql> reset slave ; Query OK, 0 rows affected (0.09 sec) mysql> reset master; Query OK, 0 rows affected (0.02 sec) mysql> set global gtid_purged="d54fe35a-2a7c-11e7-b24b-000c29217b03:1-73,f0c7d668-2a7c-11e7-b3c3-000c29217b03:1-16" ; Query OK, 0 rows affected (0.00 sec) mysql> change master to master_host='localhost' ,master_port=3307,master_user='fabric' ,master_password='123' ,master_auto_position=1; Query OK, 0 rows affected, 2 warnings (0.05 sec) mysql> start slave; Query OK, 0 rows affected (0.02 sec)
问题2 Q:在上个问题的处理过程中,曾试图跳过部分事务,所以设置了GTID_NEXT参数,然后就出现如下报错
1 2 3 4 5 6 7 mysql> stop slave ; ERROR 1837 (HY000): When @@SESSION.GTID_NEXT is set to a GTID, you must explicitly set it to a different value after a COMMIT or ROLLBACK. Please check GTID_NEXT variable manual page for detailed explanation. Current @@SESSION.GTID_NEXT is 'f0c7d668-2a7c-11e7-b3c3-000c29217b03:11' . mysql> start slave ; ERROR 1837 (HY000): When @@SESSION.GTID_NEXT is set to a GTID, you must explicitly set it to a different value after a COMMIT or ROLLBACK. Please check GTID_NEXT variable manual page for detailed explanation. Current @@SESSION.GTID_NEXT is 'f0c7d668-2a7c-11e7-b3c3-000c29217b03:11' . mysql> reset slave ; ERROR 1837 (HY000): When @@SESSION.GTID_NEXT is set to a GTID, you must explicitly set it to a different value after a COMMIT or ROLLBACK. Please check GTID_NEXT variable manual page for detailed explanation. Current @@SESSION.GTID_NEXT is 'f0c7d668-2a7c-11e7-b3c3-000c29217b03:11' .
A:
1 2 mysql> set gtid_next='automatic' ; Query OK, 0 rows affected (0.01 sec)
问题3 Q:当3307下线,从新上线后,设置spare的时候,报错
1 Last_SQL_Error: Slave failed to initialize relay log info structure from the repository
** A:** 需要在3307