这个案例是前两天出现的,一直没有时间总结,25号凌晨4点去处理数据库的故障问题。远程连上公司的局域网,psping检查发现服务器的1433端口不通,数据库连接不上,但是主机又能ping通,登录服务器检查发现sql server的sql server (mssqlserver) service 等服务都没有启动.从zabix检查也发现服务停了, 真是懵了,使用systeminfo命令检查系统的情况,发现这台服务器在凌晨3:31重启了,但是对应的sql server服务没有自动启动,
检查错误日志,发现sql server等相关服务的自动启动都失败了,如下所示:
a timeout was reached (30000 milliseconds) while waiting for the sql server (mssqlserver) service to connect.
如上截图所示,其实还有一些自动启动的服务也都出现了错误,继续往下面看,发现错误日志有下面错误信息:
the system has rebooted without cleanly shutting down first. this error could be caused if the system stopped responding, crashed, or lost power unexpectedly.
event 41的kernel-power错误意味着系统在未首先正常关机的情况下重新启动。当系统停止响应、出现故障或意外断电时,会发生此错误。更多相关信息参考
到此,我们知道了系统异常重启了,但是为什么系统重启后,那些自动启动的服务(例如sql server服务都启动失败呢?)什么原因导致呢?“a service does not start, and events 7000 and 7011 are logged in the windows event log” 这里简单的介绍了一下,但是感觉没有详细介绍。
the service control manager waits for the time that is specified by the servicespipetimeout entry before logging event 7000 or 7011. services that depend on the windows trace session manager service may require more than 60 seconds to start. therefore, increase the servicespipetimeout value appropriately to give all the dependent services enough time to start.
for more information, click the following article number to view the article in the microsoft knowledge base:
839803 the windows trace session manager service does not start and event id 7000 occurs
我们可以理解为service control manager等待sql server的服务启动,但是这个服务由于资源问题或一些依赖选项问题,导致它在30秒内没有成功启动,所以service control manager就出错了。网上有人这样介绍:
it could be that some other dependent components (the disk, network shares, etc) take longer to start up. could you set the service to start as automatic (delayed)
其实后面跟系统管理员沟通这个问题,才知道是因为数据库服务器(virtual machine)所在的nutanix一台主机由于故障,vm自动切换到另一台主机,切换过程中vm会重新启动,而且当时出现问题的有3台vm服务器(sql server 2008/2014都有)。
解决方案:
1: 将sqlsql server (mssqlserver)等相关服务的启动类型改为“automatic(delayed start)”。
2:修改serivce timeout的值。
to change the service timeout value:
1:click the start button, then click run, type regedit, and click ok.
2:in the registry editor, click the registry subkey hkey_local_machine\system\currentcontrolset\control.
3:in the details pane, locate the servicespipetimeout entry, right-click that entry and then select modify.#这个值改为60秒或120秒
note: if the servicespipetimeout entry does not exist, you must create it by selecting new on the edit menu, followed by the dword value, then typing servicespipetimeout, and clicking enter.
4:click decimal, enter the new timeout value in milliseconds, and then click ok.
5:restart the computer.
不过这个错误,我没法重现这个错误、从而无法测试验证上诉解决方案能否真正解决问题。不过上面大体分析是基本正确的。
参考资料:
https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2008-r2-and-2008/dd349403(v=ws.10)
https://support.microsoft.com/en-in/help/922918/a-service-does-not-start-and-events-7000-and-7011-are-logged-in-window