mysql高可用架构mha搭建及测试故障转移
mha项目网站
https://code.google.com/p/mysql-master-ha/
一.环境介绍
1.主机部署
manager机:10.10.54.154
master机:10.10.54.156
slave1机:10.10.54.155(备用master)
slave2机:10.10.54.157
2.大致步骤
a.首先用ssh-keygen实现四台主机之间相互免密钥登录
b.安装mhamha4mysql-node,mha4mysql-manager 软件包
c.建立master,slave1,slave2之间主从复制
d.管理机manager上配置mha文件
e.masterha_check_ssh工具验证ssh信任登录是否成功
f.masterha_check_repl工具验证mysql复制是否成功
g.启动mha manager,并监控日志文件
h.测试master(156)宕机后,是否会自动切换
3.说明:下面中括号中的主机名说明了当前操作是在哪台机子上进行的
二.首先用ssh-keygen实现四台主机之间相互免密钥登录
[manager机]
shell> ssh-keygen -t rsa -b 2048
shell> scp-copy-id root@10.10.54.155
shell> scp-copy-id root@10.10.54.156
shell> scp-copy-id root@10.10.54.157
在另外三台机子重复此步骤,使四台机子中的任何两台之间可以免密码登录
三.安装mhamha4mysql-node,mha4mysql-manager 软件包
1.四台主机上安装mhamha4mysql-node
12345 [manager,master,slave1,slave2]shell> yum updateshell> yum -y install perl-dbd-mysql ncftpshell> wget http://mysql-master-ha.googlecode.com/files/mha4mysql-node-0.53-0.noarch.rpmsehll> rpm -ivh mha4mysql-node-0.53-0.noarch.rpm
2.在manager机子上安装mha4mysql-manager
123456789101112131415161718192021 [manager]shell> yum install perlshell> yum install cpanshell> rpm -ivh mha4mysql-manager-0.53-0.el6.noarch.rpmerror:perl(config::tiny) is needed by mha4mysql-manager-0.53-0.noarchperl(log::dispatch) is needed by mha4mysql-manager-0.53-0.noarchperl(log::dispatch::file) is needed by mha4mysql-manager-0.53-0.noarchperl(log::dispatch::screen) is needed by mha4mysql-manager-0.53-0.noarchperl(parallel::forkmanager) is needed by mha4mysql-manager-0.53-0.noarchperl(time::hires) is needed by mha4mysql-manager-0.53-0.noarch[solution]shell> wget ftp://ftp.muug.mb.ca/mirror/centos/5.10/os/x86_64/centos/perl-5.8.8-41.el5.x86_64.rpmshell> wget ftp://ftp.muug.mb.ca/mirror/centos/6.5/os/x86_64/packages/compat-db43-4.3.29-15.el6.x86_64.rpmshell> wget http://downloads.naulinux.ru/pub/naulinux/6x/i386/sites/school/rpms/perl-log-dispatch-2.27-1.el6.noarch.rpmshell> wget http://dl.fedoraproject.org/pub/epel/6/i386/perl-parallel-forkmanager-0.7.9-1.el6.noarch.rpmshell> wget http://dl.fedoraproject.org/pub/epel/6/i386/perl-mail-sender-0.8.16-3.el6.noarch.rpmshell> wget http://dl.fedoraproject.org/pub/epel/6/i386/perl-mail-sendmail-0.79-12.el6.noarch.rpmshell> wget http://mirror.centos.org/centos/6/os/x86_64/packages/perl-time-hires-1.9721-136.el6.x86_64.rpmshell> rpm -ivh perl-parallel-forkmanager-0.7.9-1.el6.noarch.rpm perl-log-dispatch-2.27-1.el6.noarch.rpm perl-mail-sender-0.8.16-3.el6.noarch.rpm perl-mail-sendmail-0.79-12.el6.noarch.rpm perl-time-hires-1.9721-136.el6.x86_64.rpmshell> rpm -ivh mha4mysql-manager-0.53-0.el6.noarch.rpm
四.建立master,slave1,slave2之间主从复制
12345678910111213141516171819 [master:156]1.shell> vim /etc/my.cnf#server-id 改为1server-id=1log-bin=mysql-binbinlog_format=mixed#授权操作2.mysql> grant all privileges on *.* to 'rep'@'10.10.54.%' identified by 'rep123';mysql> flush privileges;3.mysql> show master status;[slave1,slave2]4.change master操作mysql> change master tomaster_host='10.10.54.156',master_port=3306,master_user='rep',master_password='rep123',master_log_file='mysql-bin.000001',master_log_pos=112;
注意:slave1机子上也要授权,因为这个是备用master
[slave1:155]
5.mysql> grant all privileges on *.* to 'rep'@'10.10.54.%' identified by 'rep123';
[master,slave1,slave2]
6.查看主从复制是否成功的一些命令
mysql> start slave;
mysql> stop slave;
mysql> reset slave;
mysql> show slave status\g;
五.所有主机上设置复制权限帐号
mysql> grant all privileges on *.* to 'mha_rep'@'10.10.2.10' identified by '123456';
六.manager上配置mha文件,管理各个节点
[manager:154]
shell> mkdir -p /masterha/app1
shell> mkdir /etc/masterha
shell> vim /etc/masterha/app1.cnf
12345678910111213141516171819202122232425 [server default]user=mha_rep ##mysql管理用戶名password=123456manager_workdir=/masterha/app1 #目录可以任意指定manager_log=/masterha/app1/manager.logremote_workdir=/masterha/app1ssh_user=root #ssh免密钥登录的帐号名repl_user=rep #mysql复制帐号,用来在主从机之间同步二进制日志等repl_password=rep123ping_interval=1 #ping间隔时间,用来检测master是否正常[server1]hostname=10.10.54.155#ssh_port=9999master_binlog_dir=/data/ndb #mysql数据库目录candidate_master=1 #master机宕掉后,优先启用这台作为新master[server2]hostname=10.10.54.156#ssh_port=9999master_binlog_dir=/data/ndbcandidate_master=1[server3]hostname=10.10.54.157#ssh_port=9999master_binlog_dir=/data/ndbno_master=1 #设置no_master=1使主机不能成为新master
七.验证ssh信任登陆和mysql主从复制是否成功
1.masterha_check_ssh 验证ssh信任登陆
1234 [manager:154]shell> masterha_check_ssh --conf=/etc/masterha/app1.cnfsun mar 2 17:45:38 2014 - [debug] ok.sun mar 2 17:45:38 2014 - [info] all ssh connection tests passed successfully.
2.masterha_check_repl 验证mysql复制是否成功
1234567891011 [manager:154]shell> masterha_check_repl --conf=/etc/masterha/app1.cnf---------------------------------------------------------sun mar 2 13:16:57 2014 - [info] slaves settings check done.sun mar 2 13:16:57 2014 - [info]10.10.54.156 (current master) +--10.10.54.155 +--10.10.54.157...mysql replication health is ok.---------------------------------------------------------------
八.启动mha manager,并监控日志文件
12345678910 [manager:154]shell> nohup masterha_manager --conf=/etc/masterha/app1.cnf > /tmp/mha_manager.log 2>&1shell> tail -f /masterha/app1/manager.log--------------------------------------------------------------- 10.10.54.156 (current master) +--10.10.54.155 +--10.10.54.157...sun mar 2 13:09:25 2014 - [info] ping(select) succeeded, waiting until mysql doesn't respond..-----------------------------------------------------------------
监控的manager.log文件表明mha运行良好,正在 waiting until mysql doesn't respond
九.测试master(156)宕机后,是否会自动切换
1.测试自动切换是否成功
当掉master机子
shell> /etc/init.d/myqld stop
当掉master后,manager上的监控文件/masterha/app1/manager.log显示错误信息,表示不能自动切换:
[error]
-----------------------------------------------------------
sun mar 2 13:13:46 2014 - [error][/usr/share/perl5/vendor_perl/mha/managerutil.pm, ln178] got error: use of uninitialized value $msg in scalar chomp at /usr/share/perl5/vendor_perl/mha/managerconst.pm line 90.
-----------------------------------------------------------
解决这个错误是在文件/usr/share/perl5/vendor_perl/mha/managerconst.pm 第90行(chomp $msg)前加入一行:
1 $msg = unless($msg);
好了,错误解决了,下面我们再次重复上面步骤:
master上mysql服务:shell> /etc/init.d/mysqld stop
再次查看manager机子上监控文件内容
shell> tail -f tail -f /masterha/app1/manager.log
日志文件显示:
-----------------------------------------------------------
----- failover report -----
app1: mysql master failover 10.10.54.156 to 10.10.54.155 succeeded
master 10.10.54.156 is down!
check mha manager logs at mycentos4:/masterha/app1/manager.log for details.
started automated(non-interactive) failover.
the latest slave 10.10.54.155(10.10.54.155:3306) has all relay logs for recovery.
selected 10.10.54.155 as a new master.
10.10.54.155: ok: applying all logs succeeded.
10.10.54.157: this host has the latest relay log events.
generating relay diff files from the latest slave succeeded.
10.10.54.157: ok: applying all logs succeeded. slave started, replicating from 10.10.54.155.
10.10.54.155: resetting slave info succeeded.
master failover to 10.10.54.155(10.10.54.155:3306) completed successfully.
--------------------------------------------------------
2.切换成功后,检查replication状态
[master:156]
shell> /etc/init.d/mysqld start
[manager:154]
shell> masterha_check_repl --conf=/etc/masterha/app1.cnf
--------------------------------------------------------------
sun mar 2 13:22:11 2014 - [info] slaves settings check done.
sun mar 2 13:22:11 2014 - [info]
10.10.54.155 (current master)
+--10.10.54.156
+--10.10.54.157
...
mysql replication health is ok.
---------------------------------------------------------------
上面的10.10.54.155 (current master) 这句表明master成功切换到155机子上
十.上一步测试之后,新master机为155,宕掉155机子,再次测试故障转移
1.启动管理节点
shell> nohup masterha_manager --conf=/etc/masterha/app1.cnf > /tmp/mha_manager.log 2>&1
2.启动日志检测,然后当掉新master(155),然后查看监控文件变化
shell> tail -f /masterha/app1/manager.log
3.当掉155机子(即新的master)
shell> /etc/init.d/mysqld stop
4.查看manager主机上的监控文件变化
[error][/usr/share/perl5/vendor_perl/mha/masterfailover.pm, ln295] last failover was done at 2014/03/02 13:02:47. current time is too early to do failover again. if you want to do failover, manually remove /masterha/app1/app1.failover.complete and run this script again.
错误解决办法
1.日志文件提示切换master过快,需要删除/masterha/app1/app1.failover.complete
1.删除app1.failover.complete
shell> rm /masterha/app1/app1.failover.complete
5.重新测试:
master转移成功,重新转为156机子
--------------------------------------------------------
master 10.10.54.155 is down!
check mha manager logs at mycentos4:/masterha/app1/manager.log for details.
started automated(non-interactive) failover.
the latest slave 10.10.54.156(10.10.54.156:3306) has all relay logs for recovery.
selected 10.10.54.156 as a new master.
10.10.54.156: ok: applying all logs succeeded.
10.10.54.157: this host has the latest relay log events.
generating relay diff files from the latest slave succeeded.
10.10.54.157: ok: applying all logs succeeded. slave started, replicating from 10.10.54.156.
10.10.54.156: resetting slave info succeeded.
master failover to 10.10.54.156(10.10.54.156:3306) completed successfully.
-----------------------------------------------------------
//附:故障转移后,用命令恢复原来的master
[manager:154]
1.在旧master上执行
mysql> reset master;
mysql> change master to master_host='10.10.54.155', master_port=3306, master_user='rep', master_password='rep123', master_log_file='mysql-bin.000031', master_log_pos=112;
mysql> start slave; #暂时先把旧master变为从
2.然后在manager节点上:
[manager:154]
shell> masterha_master_switch --master_state=alive --conf=/etc/masterha/app1.cnf
##master成功切换回