Nagios: Service Check Timed Out

Since I got the pleasure of watching some Windows boxen with Nagios, I took the Windows Update plugin from Michal Jankowski and implemented it. It took me some time, to initially set up the nsclient++ correctly so it just works, but up till now the check plugin sometimes reported the usual “Service Check Timed Out”. Usually I ended up increasing the cscript timeout, or the nsclient++ socket timeout, but it still kept showing up. Since I rely heavily on my surveillance tools, I have the demand, that as few as possible false positives show up. So I ended up chasing down this error today, and after that I have to say it was quite simple. ...

December 7, 2018 · 1 min · 210 words · christian

Nagios: Watching Clustered environments (the other way)

Well, recently I stepped up to watch our cluster environments … Michael has a good howto on how to watch Windows Cluster environments in the NSclient++ wiki. Now, this has it’s own perks … Which I stumbled upon when trying to write a Linux-HA OCF resource agent for the Nagios NRPE server. Combining that Linux-HA with SLES10 is a good thing generally, but using startproc in that resource agent is not such a good idea. Apparently Novell (or SuSE GmbH) thought it might be wise to include some additional logic into the wrapper. startproc, checkproc and killproc do check for the name of the executable. So if you try to start an additional process with the same name, you need to dig a bit deeper. For this to work, you need two additional things (quotations directly from man 8 startproc): -p pid_file (Former option -f changed due to the LSB specification.) Use an alternate pid file instead of the default (/var/run/ .pid). The pid read from this file is being matched against the pid of running processes that have an executable with specified path of the program. In order to avoid confusion with stale pid files, a not up-to-date pid will be ignored. Now, then apparently this isn’t enough. startproc is still refusing to start a second process. -i ignore_file The pid found in this file is used as session id of the same binary program which should be ignored by startproc.

March 19, 2009 · 3 min · 452 words · christian

Nagios: NSclient++ in a clustered Environment

Well, most of you already know that I’m a Nagios fanatic. I like to watch as many aspects as I possibly can. So, yesterday I started figuring out ways to watch our different cluster groups (housing a bunch – try above 20.000 – of file shares). Now, my first tries failed horribly. I brought down a complete cluster group, resulting in a major annoyance. Now, today I went at it a bit smarter 😛 I cloned myself two VM’s off my Windows Server 2003 Enterprise R2 template, created a new cluster. ...

February 26, 2009 · 2 min · 258 words · christian

Restarting the NSclient++ service without the management applet

For people, who are as click and point-lazy as me, here is how you restart the service without using the service management applet. 1 2 net stop "NSClientpp (Nagios) 0.3.5.2 2008-09-24 w32" net start "NSClientpp (Nagios) 0.3.5.2 2008-09-24 w32"

February 11, 2009 · 1 min · 39 words · christian

Nagios undamp; plugins

Since we started utilizing Nagios’s power two months ago, I finally came up with a C-based ram-plugin for nagios. The biggest problem I had with the python and perl based plugins, that some distributions (yes, SLES and Debian) don’t install either Python or Perl. Since I wanted a manageable setup (as in unified code base across all distributions), I wanted it to work without installing too much. So I took the swap plugin and basically removed what wasn’t necessary and voila! ...

October 6, 2007 · 1 min · 115 words · christian