前言
我们在处理数据库的问题的时候,有时在数据库端看不到异常的等待,但是应用却报连接异常。
这种情况,有可能是在压力还没有传递到数据库,在网络这层已经发生问题了。如连接风暴,此时的数据库listener的进程已经忙不过来,将连接丢弃。(在主机层面可以看到tcplistenedrop增加)。
我们如果去看普通的listener日志,看到的东西是这样的:
…… 24-feb-2017 09:11:03 * (connect_data=(sid=mynwdb)(cid=(program=sqlplus@mynwdb)(host=mynwdb)(user=appl))) * (address=(protocol=tcp)(host=11.22.3.123)(port=65184)) * establish * mynwdb * 0 24-feb-2017 09:11:06 * (connect_data=(sid=mynwdb)(cid=(program=standard)(host=mynwdb)(user=appl))) * (address=(protocol=tcp)(host=11.22.3.123)(port=65186)) * establish * mynwdb * 0 24-feb-2017 09:11:10 * (connect_data=(sid=mynwdb)(cid=(program=standard)(host=mynwdb)(user=appl))) * (address=(protocol=tcp)(host=11.22.3.123)(port=65188)) * establish * mynwdb * 0 24-feb-2017 09:11:10 * (connect_data=(sid=mynwdb)(cid=(program=standard)(host=mynwdb)(user=appl))) * (address=(protocol=tcp)(host=11.22.3.123)(port=65189)) * establish * mynwdb * 0 fri feb 24 09:11:14 2017 24-feb-2017 09:11:14 * (connect_data=(sid=mynwdb)(cid=(program=invmgr)(host=mynwdb)(user=nusr))) * (address=(protocol=tcp)(host=11.22.3.123)(port=65194)) * establish * mynwdb * 0 24-feb-2017 09:11:14 * (connect_data=(sid=mynwdb)(cid=(program=invmgr)(host=mynwdb)(user=nusr))) * (address=(protocol=tcp)(host=11.22.3.123)(port=65195)) * establish * mynwdb * 0 24-feb-2017 09:11:15 * (connect_data=(sid=mynwdb)(cid=(program=invmgr)(host=mynwdb)(user=nusr))) * (address=(protocol=tcp)(host=11.22.3.123)(port=65196)) * establish * mynwdb * 0 24-feb-2017 09:11:15 * (connect_data=(sid=mynwdb)(cid=(program=invmgr)(host=mynwdb)(user=nusr))) * (address=(protocol=tcp)(host=11.22.3.123)(port=65197)) * establish * mynwdb * 0 24-feb-2017 09:11:15 * (connect_data=(sid=mynwdb)(cid=(program=invmgr)(host=mynwdb)(user=nusr))) * (address=(protocol=tcp)(host=11.22.3.123)(port=65198)) * establish * mynwdb * 0 24-feb-2017 09:11:15 * (connect_data=(sid=mynwdb)(cid=(program=invmgr)(host=mynwdb)(user=nusr))) * (address=(protocol=tcp)(host=11.22.3.123)(port=65199)) * establish * mynwdb * 0 24-feb-2017 09:11:15 * (connect_data=(sid=mynwdb)(cid=(program=invmgr)(host=mynwdb)(user=nusr))) * (address=(protocol=tcp)(host=11.22.3.123)(port=65200)) * establish * mynwdb * 0 24-feb-2017 09:11:15 * (connect_data=(sid=mynwdb)(cid=(program=invmgr)(host=mynwdb)(user=nusr))) * (address=(protocol=tcp)(host=11.22.3.123)(port=65201)) * establish * mynwdb * 0 24-feb-2017 09:11:16 * (connect_data=(sid=mynwdb)(cid=(program=invmgr)(host=mynwdb)(user=nusr))) * (address=(protocol=tcp)(host=11.22.3.123)(port=65202)) * establish * mynwdb * 0 24-feb-2017 09:11:16 * (connect_data=(sid=mynwdb)(cid=(program=invmgr)(host=mynwdb)(user=nusr))) * (address=(protocol=tcp)(host=11.22.3.123)(port=65203)) * establish * mynwdb * 0 24-feb-2017 09:11:16 * (connect_data=(sid=mynwdb)(cid=(program=invmgr)(host=mynwdb)(user=nusr))) * (address=(protocol=tcp)(host=11.22.3.123)(port=65204)) * establish * mynwdb * 0 24-feb-2017 09:11:16 * (connect_data=(sid=mynwdb)(cid=(program=invmgr)(host=mynwdb)(user=nusr))) * (address=(protocol=tcp)(host=11.22.3.123)(port=65205)) * establish * mynwdb * 0 24-feb-2017 09:11:16 * (connect_data=(sid=mynwdb)(cid=(program=invmgr)(host=mynwdb)(user=nusr))) * (address=(protocol=tcp)(host=11.22.3.123)(port=65206)) * establish * mynwdb * 0 24-feb-2017 09:11:17 * (connect_data=(sid=mynwdb)(cid=(program=invmgr)(host=mynwdb)(user=nusr))) * (address=(protocol=tcp)(host=11.22.3.123)(port=65207)) * establish * mynwdb * 0 24-feb-2017 09:11:17 * (connect_data=(sid=mynwdb)(cid=(program=invmgr)(host=mynwdb)(user=nusr))) * (address=(protocol=tcp)(host=11.22.3.123)(port=65208)) * establish * mynwdb * 0 ……
这样我们会看的比较晕,不知道里面的连接数变化。
有一个listener日志的解析利器,可以分析到这种类型的问题。下载见文末。
用法:lsnr_analyzer.pl -t ‘time rexge' [-h] filename
如,我们要分析24-feb-2017 09:30~09:59的连接情况,可以如下操作:
[ora@mynwdb trace]$ lsnr_analyzer.pl -t '24-feb-2017 09:[3-5][0-9]' mynwdb_cut1.log 24-feb-2017 09:30 => total:37 24-feb-2017 09:31 => total:50 24-feb-2017 09:32 => total:17 24-feb-2017 09:33 => total:16 24-feb-2017 09:34 => total:19 24-feb-2017 09:35 => total:16 24-feb-2017 09:36 => total:17 24-feb-2017 09:37 => total:16 24-feb-2017 09:38 => total:13 24-feb-2017 09:39 => total:29 24-feb-2017 09:40 => total:43 24-feb-2017 09:41 => total:38 24-feb-2017 09:42 => total:17 24-feb-2017 09:43 => total:16 24-feb-2017 09:44 => total:42 24-feb-2017 09:45 => total:21 24-feb-2017 09:46 => total:16 24-feb-2017 09:47 => total:15 24-feb-2017 09:48 => total:12 24-feb-2017 09:49 => total:25 24-feb-2017 09:50 => total:47 24-feb-2017 09:51 => total:41 24-feb-2017 09:52 => total:21 24-feb-2017 09:53 => total:12 24-feb-2017 09:54 => total:17 24-feb-2017 09:55 => total:16 24-feb-2017 09:56 => total:20 24-feb-2017 09:57 => total:17 24-feb-2017 09:58 => total:17 24-feb-2017 09:59 => total:21 [ora@mynwdb trace]$
可以看到,在09:31的时候,有个连接数的增加,到了一分钟50个连接。
我们可以加-h的参数,breakdown具体是哪些主机连接过来的:
[ora@mynwdb trace]$ lsnr_analyzer.pl -t '24-feb-2017 09:[3-5][0-9]' -h mynwdb_cut1.log 24-feb-2017 09:30 => total:37 11.22.3.123 37 24-feb-2017 09:31 => total:50 22.3.4.111 1 11.22.3.123 49 24-feb-2017 09:32 => total:17 10.8.1.218 1 11.22.3.123 16 24-feb-2017 09:33 => total:16 11.22.3.123 16 24-feb-2017 09:34 => total:19 11.22.3.123 19 24-feb-2017 09:35 => total:16 11.22.3.123 16 24-feb-2017 09:36 => total:17 22.3.4.111 1 11.22.3.123 16 24-feb-2017 09:37 => total:16 12.5.6.221 1 11.22.3.123 15 24-feb-2017 09:38 => total:13 11.22.3.123 13 24-feb-2017 09:39 => total:29 11.22.3.123 29 24-feb-2017 09:40 => total:43 11.22.3.123 43 24-feb-2017 09:41 => total:38 22.3.4.111 1 11.22.3.123 37 24-feb-2017 09:42 => total:17 12.5.6.221 1 11.22.3.123 16 24-feb-2017 09:43 => total:16 11.22.3.123 16 24-feb-2017 09:44 => total:42 11.22.3.123 42 24-feb-2017 09:45 => total:21 11.22.3.123 21 24-feb-2017 09:46 => total:16 22.3.4.111 1 11.22.3.123 15 24-feb-2017 09:47 => total:15 12.5.6.221 1 11.22.3.123 14 24-feb-2017 09:48 => total:12 11.22.3.123 12 24-feb-2017 09:49 => total:25 11.22.3.123 25 24-feb-2017 09:50 => total:47 11.22.3.123 47 24-feb-2017 09:51 => total:41 22.3.4.111 1 11.22.3.123 40 24-feb-2017 09:52 => total:21 12.5.6.221 1 11.22.3.123 20 24-feb-2017 09:53 => total:12 11.22.3.123 12 24-feb-2017 09:54 => total:17 11.22.3.123 17 24-feb-2017 09:55 => total:16 11.22.3.123 16 24-feb-2017 09:56 => total:20 22.3.4.111 1 11.22.3.123 19 24-feb-2017 09:57 => total:17 12.5.6.221 1 11.22.3.123 16 24-feb-2017 09:58 => total:17 11.22.3.123 17 24-feb-2017 09:59 => total:21 11.22.3.123 21 [ora@mynwdb trace]$
可以看到09:30的时候,是来自11.22.3.123有49个连接,来自22.3.4.111有1个连接。
注1:lsnr_analyzer.pl 是用来分析已经establish的连接,如果是其他连接,不统计在内。
注2:lsnr_analyzer.pl 的统计比较消耗cpu,30万行的listener日志分析大约1秒出结果,300万行的listener日志分析大约6秒出结果。所以如果listener日志比较大,建议tail剪切后统计。
对于已经出现tcplistendrop的情况,我们分几种情况分析:
(1)如果是突然的连接风暴,需要控制前端的连接,如weblogic中间件逐台启动,而不是同时启动;
(2)如果一直连接数很高,导致listener无法处理,就需要拆分listener,如按照端口拆分。
点击下载:
总结
以上就是这篇文章的全部内容了,希望本文的内容对大家的学习或者工作能带来一定的帮助,如果有疑问大家可以留言交流,谢谢大家对www.887551.com的支持。