最近两天,一台ORACLE数据库的作业执行delete_ob_get_epps.sh脚本清理过期备份时,执行下面SQL语句就会被阻塞,在监控工具DPA里面部分截图如下(图片分开截断)
sql ‘alter system archive log current’;
如上截图所示,会话ID=650的EVENT为Log archive I/O,被阻塞的会话303在等待事件 enq:WL-contention 关于Log archive I/O的资料如下
Log archive I/O
Used local archiving of online redo logs (for a production database) or standby redo logs (for a standby database). When the archiving process exhausts its I/O buffers because all of them are being used for on-going I/O’s, the wait for an available I/O buffer is captured in this system wait event.
Wait Time: Depends on the speed of the disks
Parameters: None
后面在metalink上找到相关资料:ALTER SYSTEM ARCHIVELOG CURRENT hangs on WL-enqueue (文档 ID 1209896.1),文档描述这是一个bug,这个生产系统为Oracle Database 10g Release 10.2.0.4.0 – 64bit Production,虽然这官方文档描述这个版本出现的版本为Oracle Database – Enterprise Edition – Version 10.2.0.5 and later。相信10.2.0.4可能也会存在这个问题, 具体信息如下:
APPLIES TO:
Oracle Database – Enterprise Edition – Version 10.2.0.5 and later
Information in this document applies to any platform.
SYMPTOMS
ALTER SYSTEM ARCHIVE LOG CURRENT hangs via SQL*Plus, but also during the RMAN-backup.
Therefor the BACKUP ARCHIVELOG ALL, never completes.
Another symptom is that V$ARCHIVE_LOG.APPLIED is not updated
CAUSE
The root-cause is unpublished bug 6113783 – ARC PROCESSES CAN HANG INDEFINITELY ON NETWORK
The session which is executing the ALTER SYSTEM ARCHIVE LOG CURRENT is waiting for the event :
‘enq: WL – contention’
This session holding this enqueue seems to be hanging and therefor blocking the ARCHIVE LOG CURRENT to continue.
Get the blocker with :
SQL> select * from v$lock
where v$lock.type = ‘WL’
and v$lock.lmode > 0
and v$lock.block = 1;
The related process is :
SQL> select v$session.machine, v$session.process, v$session.program
from v$session, v$lock
where v$lock.sid = v$session.sid
and v$lock.type = ‘WL’
and v$lock.lmode > 0
and v$lock.block = 1;
SOLUTION
If the blocker is an archiver process (ARCx) than the issue is related to the unpublished bug 6113783 and is fixed in 11g Release2. (11.2.X)
Some patches exist for 11.1.0.7. Check Patch 6113783
The workaround for 10g is to kill the related archiver process on OS-level.
Unix:
% kill -9 <pid>
The archiver will be restarted automaticly.
如果取消执行归档当前日志,那么上面阻塞就会消息,如果再次执行alter system archive log current,就会又出现这个阻塞,具体相关信息如下
SQL> select * from v$lock
2 where v$lock.type = 'WL'
3 and v$lock.lmode > 0
4 and v$lock.block = 1;
ADDR KADDR SID TY ID1 ID2 LMODE REQUEST CTIME BLOCK
---------------- ---------------- ---------- -- ---------- ---------- ---------- ---------- ---------- ----------
0000000409D991D8 0000000409D991F8 615 WL -2.115E+09 980630802 5 0 35788 1
SQL> select v$session.machine, v$session.process, v$session.program
2 from v$session, v$lock
3 where v$lock.sid = v$session.sid
4 and v$lock.type = 'WL'
5 and v$lock.lmode > 0
6 and v$lock.block = 1;
MACHINE PROCESS PROGRAM
----------------------------------- ------------ -----------------------------
getlnx01.gfg1.esquel.com 10790 rman@xxx.xxx.xxx.com (TNS V1-V3)
SQL> select sid, program from v$session where sid in (select sid from v$lock where sid=615);
SID PROGRAM
---------- ------------------------------------------------
615 rman@xxx.xxx.xxx.com (TNS V1-V3)
然后我也测试验证了一下,取消执行归档当前日志操作,阻塞立马消失;执行切换redo log(alter system switch logfile),发现redo log又成功归档了。不会出现这个问题。但是比较奇怪的是之前没有出现这个问题。不清楚什么条件触发了这个bug。
参考资料:
ALTER SYSTEM ARCHIVELOG CURRENT hangs on WL-enqueue (文档 ID 1209896.1)