如下是 MHA 的 masterha_master_switch 功能的 官方介绍,大体翻译下,需要时候可以查阅。

masterha_master_switch

masterha_master_switch可以用来做master failover 也可以用来做在线的主库切换。

Manual Failover

有时候需要做手动failover, 可以用如下命令:

1
$ masterha_master_switch --master_state=dead --conf=/etc/app1.cnf --dead_master_host=host1 

可用的参数如下:

  • –master_state=dead

    强制参数. 可选有: “dead” or “alive”. 如果设置为 alive,将执行 在线切主操作。后面介绍。

  • –dead_master_host=(hostname)

    强制参数,–dead_master_ip 和 –dead_master_port 也可以指定。

  • –new_master_host=(hostname)

    可选参数。如果不指定,程序自动选举。

  • –interactive=(0|1)

    1为交互模式,默认;0为非交互。

  • –ssh_reachable=(0|1|2)

    Specifying whether the master is reachable via SSH or not. Set 0 if not reachable, set 1 if reachable, set 2 if unknown. Default is 2. If 2 is set, the command internally checks the master is reachable via SSH or not, and update the internal SSH status with 0 or 1. If the master is reachable via SSH, and if master_ip_failover_script or shutdown_script is set, the command passes “–command=stopssh”. If not, masterha_master_switch passes “–command=stop”. In addition, if the crashed master is reachable via SSH, the failover script tries to copy unsent binary logs from the crashed master.master是否通过SSH可达。0表示不可达;2表示未知,默认。

  • –skip_change_master

    By setting this argument on master failover, MHA exits after applying differential relay logs, and skipping CHANGE MASTER and START SLAVE. So slaves do not point to a new master. This may help if you want to manually double check whether slave recovery succeeded or not.当切换完毕,不执行CHANGE MASTER TO 操作,这在需要手动确认切换效果的时候可能用到。

  • –skip_disable_read_only

    By passing this parameter, MHA skips executing SET GLOBAL read_only=0 on the new master. This may be useful when you want to manually disable .跳过在新主上 SET GLOBAL read_only=0的操作。以便稍后手动操作。

  • –last_failover_minute=(minutes)

    Same as masterha_manager.

  • –ignore_last_failover

    Same as masterha_manager.

  • –wait_on_failover_error=(seconds)

    Same as masterha_manager.

    Note that this parameter applies to automated/non-interactive failover only, and this does not apply to interactive failover. That is, if –interactive=0 is not set, wait_on_failover_error is simply ignored and does not sleep on errors.

  • –remove_dead_master_conf

    Same as masterha_manager.

    masterha_master_switch runs interactive failover procedures by default. You need to type “yes” from keyboard as below.

    … Starting master switch from host1(192.168.0.1:3306) to host2(192.168.0.2:3306)? (yes/NO): yes …

    New master is determined by the same rule as automated failover, if –new_master_host is not set. When you run manual failover, you have an option to set new master explicitly. The below is an example.

    Starting master switch from host1(192.168.0.1:3306) to host2(192.168.0.2:3306)? (yes/NO): no Continue? (yes/NO): yes Enter new master host name: host5 Master switch to gd1305(10.17.1.238:3306). OK? (yes/NO): yes …

    In this case, host5 will be new master, as long as binary logging is enabled and major version is not higher than other slaves. The below command has the same effect as the above.

    1
    2
    $ masterha_master_switch --master_state=dead --conf=/etc/app1.cnf --dead_master_host=host1 --new_master_host=host5

  • –wait_until_gtid_in_sync(0|1)

    This option is available since 0.56.

    When doing GTID based failover, MHA waits until slaves to catch up the new master’s GTID if setting wait_until_gtid_in_sync=1. If setting 0, MHA doesn’t wait slaves to catch up. Default is 1. 适用于GTID模式,设置为1表示MHA将等待所有slave追上新master的GTID,默认;0表示不等。

  • –skip_change_master

    This option is available since 0.56.

    If this option is set, MHA skips executing CHANGE MASTER. MHA将跳过CHANGE MASTER操作。

  • –skip_disable_read_only

    This option is available since 0.56.

    If this option is set, MHA skips executing SET GLOBAL read_only=0 on the new master.

  • –ignore_binlog_server_error

    This option is available since 0.56.

    If this option is set, MHA ignores any error from binlog servers during failover.

开始举例子

  • ** 确认老主已死,并明确指定新主**

masterha_master_switch –conf=/etc/app1.cnf –master_state=dead –dead_master_host=host1 –new_master_host=host2

  • ** 确认老主已死,系统系统自动选新主**

masterha_master_switch –conf=/etc/app1.cnf –master_state=dead –dead_master_host=host1

Scheduled(Online) Master Switch

有时候我们希望有计划的切主,甚至老主正在运行着。典型的情景就是更换硬件,升级服务器等这些需要停机的操作,这时候就可以在线切主。

如下:

1
2
$ masterha_master_switch --master_state=alive --conf=/etc/app1.cnf --new_master_host=host2

–master_state=alive must be set. Program flows for the scheduled master switch is slightly different from the master failover. For example, you do not need to power off the master server, but you need to make sure that write queries are not executed on the master. By settingmaster_ip_online_change_script, you can control how to disallow write traffics on the current master (i.e. dropping writable users, setting read_only=1, etc) before executing FLUSH TABLES WITH READ LOCK, and how to allow write traffics on the new master. –master_state=alive必须要设置。

Online master switch starts only when all of the following conditions are met.

  • IO threads on all slaves are running
  • SQL threads on all slaves are running
  • Seconds_Behind_Master on all slaves are less or equal than –running_updates_limit seconds
  • On master, none of update queries take more than –running_updates_limit seconds in the show processlist output

The reasons of these restrictions are for safety reasons, and to switch to the new master as quickly as possible. masterha_master_switch takes below arguments when switching master online.

  • –new_master_host=(hostname)

    New master’s hostname. 新主的hostname

  • –orig_master_is_new_slave

    After master switch completes, the previous master will run as a slave of the new master. By default, it’s disabled (the previous master will not join new replication environments). If you use this option, you need to set repl_password parameter in the config file because current master does not know the replication password for the new master. 将老主切换成新主的slave节点。如果不指定这个参数,老主将不加入新的集群环境。

  • –running_updates_limit=(seconds)

    If the current master executes write queries that take more than this parameter, or any of the MySQL slaves behind master more than this parameter, master switch aborts. By default, it’s 1 (1 second).当老主当前写操作执行超过N秒,或者任何一台slave的 slaves behind master 超过N秒,切换都将终止。

  • –remove_orig_master_conf

    When this option is set, if master switch succeeds correctly, MHA Manager automatically removes the section of the dead master from the configuration file. By default, the configuration file is not modified at all.

  • –skip_lock_all_tables

    When doing master switch, MHA runs FLUSH TABLES WITH READ LOCK on a orig master to make sure updates are really stopped. But FLUSH TABLES WITH READ LOCK is very expensive and if you can make sure that no updates are coming to the orig master (by killing all clients at master_ip_online_change_script etc), you may want to avoid to lock tables by using this argument,切换过程中,老主将被施加 FLUSH TABLES WITH READ LOCK ,这是个昂贵的操作,如果你可以确定没有写操作,可以指定这个参数。

又要举例子了

  • 将master从host1切换到host2,切换完成,将host1作为新环境的slave继续运行

masterha_master_switch –conf=/etc/app1.cnf –master_state=alive –new_master_host=host2 –orig_master_is_new_slave –running_updates_limit=5

  • 将master从host1切换到host2,切换完成,host1不再参与新环境的复制

masterha_master_switch –conf=/etc/app1.cnf –master_state=alive –new_master_host=host2 –running_updates_limit=5