节点一 alert日志:
pdb(17):transaction recovery: lock conflict caught and ignored
pdb(17):transaction recovery: lock conflict caught and ignored
pdb(17):transaction recovery: lock conflict caught and ignored
…
节点二: alert日志
pdb(17):minact-scn: useg scan erroring out with error e:12751
pdb(17):minact-scn: useg scan erroring out with error e:12751
pdb(17):minact-scn: useg scan erroring out with error e:12751
…
起因:
分区表 表级nologging,分批四千万数据并行插入,全表共6亿多数据,导致死锁,查询
select ‘alter system kill session ‘||chr(39)||t2.sid||’,’||t2.serial#||chr(39)||’;’
from v$locked_object t1,v$session t2
where t1.session_id=t2.sid order by t2.logon_time;
alter system kill session ‘4843,29019’;
…
全部是同一个会话,竟有3000多条记录,
后采用杀会话的方式没能释放,又用kill -9 杀掉进程。随后,alert日志出现以上告警。
然后各种查资料…网上有说要dump数据快,技术有限,还好是压力测试用的,等着慢慢恢复,停止此表数据的插入。
1.查看恢复时使用的回滚段
select b.name useg, b.inst# instid, b.status$ status, a.ktuxeusn
xid_usn, a.ktuxeslt xid_slot, a.ktuxesqn xid_seq, a.ktuxesiz undoblocks,
a.ktuxesta txstatus
from x$ktuxe a, undo$ b
where a.ktuxecfl like ‘%dead%’
and a.ktuxeusn = b.us#;
useg instid status xid_usn xid_slot xid_seq undoblocks txstatus
—————————— ———- ———- ———- ———- ———- ———- —————-
_syssmu30_2947991045$ 1 3 30 10 50494 3572115 active
2.查看恢复进度
select ktuxeusn usn, ktuxeslt slot, ktuxesqn seq, ktuxesta state, ktuxesiz undo from x$ktuxe
where ktuxesta <> ‘inactive’
and ktuxecfl like ‘%dead%’
order by ktuxesiz asc;
usn slot seq state undo
———- ———- ———- —————- ———-
30 10 50494 active 2815649
等着慢慢恢复,一分钟约一万block的速度,6个小时3572115blocks。第二天观察undo恢复正常了,告警也随之消失。