发现问题
top命令 查看服务器负载,发现 mysql竟然百分之两百的cpu,引起mysql 负载这么高的原因,估计是索引问题和某些变态sql语句.
排查思路
1. 确定高负载的类型,top命令看负载高是cpu还是io。
2. mysql 下执行查看当前的连接数与执行的sql 语句。
3. 检查慢查询日志,可能是慢查询引起负载高。
4. 检查硬件问题,是否磁盘故障问题造成的。
5. 检查监控平台,对比此机器不同时间的负载。
确定负载类型(top)
top - 10:14:18 up 23 days, 11:01, 1 user, load average: 124.17, 55.88, 24.70 tasks: 138 total, 1 running, 137 sleeping, 0 stopped, 0 zombie cpu(s): 2.4%us, 1.0%sy, 0.0%ni, 95.2%id, 2.0%wa, 0.1%hi, 0.2%si, 0.0%st mem: 3090528k total, 2965772k used, 124756k free, 93332k buffers swap: 4192956k total, 2425132k used, 1767824k free, 756524k cached pid user pr ni virt res shr s %cpu %mem time+ command 30833 mysql 15 0 6250m 2.5g 4076 s 257.1 49.9 529:34.45 mysqld
查看当前的连接数与执行的sql 语句
show processlist; id user host db command time state info 192 slave 8.8.8.142:39820 null binlog dump 58982 has sent all binlog to slave; waiting for binlog to be updated null 194 slave 8.8.8.120:41075 null binlog dump 58982 has sent all binlog to slave; waiting for binlog to be updated null 424891 biotherm 8.8.8.46:57861 biotherm query 493 sending data select * from xxx_list where tid = '1112' and del = 0 order by id desc limit 0, 4 424917 biotherm 8.8.8.49:50984 biotherm query 488 sending data select * from xxx_list where tid = '1112' and del = 0 order by id desc limit 0, 4 .............................................. 430330 biotherm 8.8.8.42:35982 biotherm query 487 sending data select * from xxx_list where tid = '1112' and del = 0
记录慢查询
编辑mysql 配置文件(my.cnf),在[mysqld]字段添加以下几行:
log_slow_queries = /usr/local/mysql/var/slow_queries.log #慢查询日志路径 long_query_time = 10 #记录sql查询超过10s的语句 log-queries-not-using-indexes = 1 #记录没有使用索引的sql
查看慢查询日志
tail /usr/local/mysql/var/slow_queries.log # time: 130305 9:48:13 # user@host: biotherm[biotherm] @ [8.8.8.45] # query_time: 1294.881407 lock_time: 0.000179 rows_sent: 4 rows_examined: 1318033 set timestamp=1363916893; select * from xxx_list where tid = '11xx' and del = 0 order by id desc limit 0, 4;
4个参数
query_time: 0 lock_time: 0 rows_sent: 1 rows_examined: 54
分别意思为:查询时间 锁定时间 查询结果行数 扫描行数,主要看扫描行数多的语句,然后去数据库加上对应的索引,再优化下变态的sql 语句。
极端情况kill sql进程
找出占用cpu时间过长的sql,在mysql 下执行如下命令: show processlist; 确定后一条sql处于query状态,且time时间过长,锁定它的id,执行如下命令: kill query 269815764;
注意:杀死 sql进程,可能导致数据丢失,所以执行前要衡量数据的重要性。