添加磁盘导致的ASM实例crash

今天核心系统将一块磁盘（emc dmx4）添加到了asm dg中，然后数据库rac两个节点双双crash掉了，顿时吓了一身冷汗。检查日志： note: disk in mode 0x8 marked for de-assignmenterror: diskgroup dgidx1 was not mountedora-15032: not all alterations perf
今天核心系统将一块磁盘（emc dmx4）添加到了asm dg中，然后数据库rac两个节点双双crash掉了，顿时吓了一身冷汗。
检查日志：
note: disk in mode 0x8 marked for de-assignmenterror: diskgroup dgidx1 was not mountedora-15032: not all alterations performedora-15040: diskgroup is incompleteora-15042: asm disk 16 is missing from group number 4 error: alter diskgroup dgidx1 mount /* asm agent *//* {1:8345:41140} */thu nov 06 15:17:41 2014errors in file /oraclelog/grid/diag/asm/+asm/+asm1/trace/+asm1_pz99_22545054.trc:ora-27063: number of bytes read/written is incorrectibm aix risc system/6000 error: 16: device busyadditional information: -1additional information: 4096warning: read failed. group:0 disk:10 au:0 offset:0 size:4096errors in file /oraclelog/grid/diag/asm/+asm/+asm1/trace/+asm1_pz99_22545054.trc:ora-27063: number of bytes read/written is incorrectibm aix risc system/6000 error: 16: device busyadditional information: -1additional information: 4096warning: read failed. group:0 disk:9 au:0 offset:0 size:4096errors in file /oraclelog/grid/diag/asm/+asm/+asm1/trace/+asm1_pz99_22545054.trc:ora-27063: number of bytes read/written is incorrectibm aix risc system/6000 error: 16: device bus
新加的盘不能使用，但是此时两个节点尝试asm和数据库实例恢复，第二个节点却起了起来，目前问题是第一个节点的读取问题。可能是这个lun对主机的存储锁、san链路等问题导致了。此时在第二个节点asm实例中查看v$asm_operation视图，结果为空。看来这个盘的rebalance操作已经完成了。为了让这个生产系统早点上线，我们选择了把这个有问题的lun从asm第二个实例中剔除，还原初始环境。在asmca中操作后，检查rebalance进度：
sql> select * from v$asm_operation;group_number opera stat power actual sofar est_work est_rate------------ ----- ---- ---------- ---------- ---------- ---------- ----------est_minutes error_code----------- -------------------------------------------- 4 rebal run 1 1 36659 49690 2899 4
一共49g的数据需要操作，等待sofar=est_work后，该lun被成功剔除。此时第一个节点的asm实例也成功启动。吃一堑长一智，在数据库真正使用一个磁盘之前，检查设备的可用性是非常重要的。oracle的acs也提到了一个工具kfod（in $grid_home/bin），可以快速检查lun的有效性，盖总也简单介绍过该工具：kfod in oracle_asm
# 在第2个节点，可以找到该磁盘的信息$ kfod disk=all |grep 113 139: 51930 mb /dev/rhd113 grid asmadmin# 在第1个节点，则找不到该磁盘的信息，说明oracle gi无法正确识别该lun。$ kfod disk=all |grep 113
>o
原文地址：添加磁盘导致的asm实例crash, 感谢原作者分享。

添加磁盘导致的ASM实例crash

VIP推荐