Nagios

Configuring nagios-plugins-zypper

Since I’m running check_zypper via nrpe (which in turn runs as nobody), I need to set up sudo. In order for the plugin to work, we need to add the following line to /etc/sudoers (by means of visudo): 1 nobody ALL = NOPASSWD: /usr/bin/zypper sl, /usr/bin/zypper --non-interactive --no-gpg-checks --terse list-updates (Keep in mind this needs to be a single line …)

Praxisbuch Nagios by Tobias Scherbaum

Tobi recently finished writing yet another book, which he also talked about in a blog post. Shortly after, I asked him a rather curious question. What exactly is the plant or animal on the cover of the book ? He was kind enough to send a voucher copy of the book my way. Praxisbuch Nagios He actually mentions it in the credits at the beginning of the book. Turns out it is an animal, a sea pen or sea feather (I’m guessing at Pennatula aculeata). ...

Nagios: SNMP OID's for IBM's RSA II adapter

Well, after some poking around I finally found some OID’s for the RSA’s (only through these two links: check_rsa_fan and check_rsa_temp). For Nagios, I dismissed the fans, since the fan speed is only passed on in percent values. So I only added this: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 define hostgroup{ hostgroup_name rsa-snmp alias Remote Supervisor Adapter (allowing SNMP connections) } define service{ use generic-perfdata check_command check_rsa_snmpv1_public!.1.3.6.1.4.1.2.3.51.1.2.1.2.1.1!45!60!°C!Temperature CPU0! hostgroup_name rsa-snmp service_description TEMP CPU0 } define service{ use generic-perfdata check_command check_rsa_snmpv1_public!.1.3.6.1.4.1.2.3.51.1.2.1.2.2.1!45!60!°C!Temperature CPU1! hostgroup_name rsa-snmp service_description TEMP CPU1 } define service{ use generic-perfdata check_command check_rsa_snmpv1_public!.1.3.6.1.4.1.2.3.51.1.2.1.5.1.0!29!35!°C!Temperature Ambient! hostgroup_name rsa-snmp service_description TEMP AMBIENT } Oh, and if anyone else is curious like me, here’s the list with the OID’s, courtesy of Gerhard Gschlad and Leonardo Calamai. ...

Nagios: check_snmp again

Well, today I had to grind my head again, regarding the way check_snmp handles WARNING and CRITICAL events. From my point of view, check_snmp is really just retarded sometimes. As you know, all the other plugins accept WARNING and CRITICAL-thresholds based on the calculation, if the return integer is above this threshold it reached WARNING/CRITICAL state. But check_snmp doesn’t play that way. It expects only ranges, which are NOT gonna result in warning or critical events. Which is kinda stupid, since you gotta rethink twice about the thresholds 😛 ...

Nagios: NSclient++ in a clustered Environment

Well, most of you already know that I’m a Nagios fanatic. I like to watch as many aspects as I possibly can. So, yesterday I started figuring out ways to watch our different cluster groups (housing a bunch – try above 20.000 – of file shares). Now, my first tries failed horribly. I brought down a complete cluster group, resulting in a major annoyance. Now, today I went at it a bit smarter 😛 I cloned myself two VM’s off my Windows Server 2003 Enterprise R2 template, created a new cluster. ...

Monitoring the IBM BladeCenter chassis with Nagios

Today I ended up working out the details on what we want to monitor regarding our BladeCenter. The most interesting details (for us that is) are these: Fan speeds for Chassis Cooling/Power Module Cooling Bay(s) Temperature Power Domain utilization It wasn’t * that* hard to implement. Only trouble(s) I ran into, were ( 1) IBM did a real shitty job with the MIB’s. If you look closely into the mmblade.mib, you’re gonna notice, that not a single OID is specified for the events. ( 2) As the MIB’s weren’t documented anywhere, I had to look them up via snmpwalk (which I had never used before). So as a reminder (to myself), here’s how it is done: 1 snmpwalk -v1 -c public -O n 10.0.0.35 .1.3.6.1.4.1.2.3.51.2.2 This will get you a list, with a lot of output (5154 lines to be exact). Lucky me, the web interface of the management module/ssh interface is rather verbose, so all you need to do is compare those values with what you are looking for. So for myself (and anyone interested) read ahead for the list of checks we are currently running on the management module.

Opsview installation reviewed

Well, I recently (well, yesterday) built the opsview RPM’s for SLES10, and started fiddeling about with it today. Alex " recommended" I should rather look at Opsview instead of Centreon, but boy was there a surprise waiting for me … Opsview has the advantage that it at least lets you use the package manager. But, it also needs a lot of handy work (just like Centreon, which I really dislike since it’s real error prone). ...

Nagios and check_ram yet again

As some people know, I previously " created" (mostly modified the check_swap plug-in to print RAM usage) check_ram in C. Now one of my problems for the past few months was putting the C plug-in as well as " supported" environment under the same hat. Today I had another look at the amount of available plug-ins in NagiosExchange. There are quite a few plug-ins available, but as I do have some experience with Python, I used the one written in Python. It was rather easy hacking in support for performance data into it, as the below shows. Someone else already posted a non-unified diff for performance data support, but that ain’t quite right according to the Nagios plug-in development guidelines.

Suspected NRPE weirdness

Well, I just noticed a really weird thing, when you have command line arguments enabled. Here’s a snippet from my nrpe.cfg: 1 2 dont_blame_nrpe=1 command[check_disk]=/usr/lib/nagios/plugins/check_disk -E -w $ARG1$ -c $ARG2$ -p $ARG3$ Now, if you’d check the free space for the root, it ain’t gonna show any inode percentage (that one isn’t what I’m talking about). But if you have to use bind mounts like I do (Tivoli needs a separate " domain" -- that is a separate mount point for each domain), you might wanna check the free space on the real device, rather than the free space on the bind mount (which is gonna show you the free space of the parent file system - in my case the root fs). ...

Nagios 3 and hostgroup inheritance

As I wrote some time ago, I was trying to utilize Nagios 3.x’s neat feature of " nested" hostgroups. Well, as it turned out I thought it worked differently; basically like this: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 define hostgroup { hostgroup_name a-parent-hostgroup alias Our toplevel parent hostgroup } define service { use generic-service check_command check_dummy!0! service_description SSH hostgroup_name a-parent-hostgroup } define hostgroup { hostgroup_name a-child-hostgroup hostgroup_members a-parent-hostgroup alias Our child hostgroup } define service { use generic-service check_command check_dummy!0! service_description LOAD hostgroup_name a-child-hostgroup } As you can cleary see on line 14, I thought you define the relation between two hostgroups in the child hostgroup. The problem with it was basically (as I said in the earlier posts), that all the services defined for the child hostgroups are handed on upwards to the parent hostgroup(s). ...