Nagios virtualization

As virtualization seems to be a trendy thing to do, I went ahead and virtualized our nagios (while reinstalling the whole thing …). Now as I went into work today and started my email client, I received 4 nagios warnings about a LOAD service reaching critical state. Looked at the nagios box itself, opened up the VM console, looked into the syslog. Nothing. Yet over 3/4 of the services were flapping, some ping checks were critical (for whatever reason). So I opened the nagios webinterface again, and noticed it dropping the connection over and over again (had to reauthentificate me again and again). ...

August 16, 2014 · 2 min · 232 words · christian

Nagios Hostgroup Inheritance

As I wrote earlier, I recently virtualized our nagios. Along with that came a complete " redesign" of how checks are applied. Up till now, I defined checks for each and every single server, thus ending up with ~25 files, each holding roughly 6 checks which are in the same file just sorted by hostname. As you can imagine, it gets quite confusing with that amount of checks (~150). So the last two days I spent on reorganizing (with Visio), on which object/hostgroup placing a check would make sense. Now, this is my first result of two days planning, reorganizing, reordering and moving hosts into different hostgroups. ...

August 16, 2014 · 1 min · 149 words · christian

Cascading Style Sheets are really weird

So here I was, sitting around and thinking about formatted classes for my paragraphs. Now the result is quite pleasing, but has some side effects. But see for yourself … Messed up CSS As you can see, the browser is reusing the background-image URL from the element within the element, even though the element initially had none. Even defining putting an background-image: none; into the class doesn’t get me anywhere.The weird thing is only Firefox is displaying it this messed up (not so weird when you think about how IE 6 treats standards). So if any of you CSS wiz’ got a suggestion, I’m listening 😄 ...

August 16, 2014 · 1 min · 117 words · christian

Windows Cluster Service (continued)

Well, guess my " solution" didn’t work sooo good. Lemme tell you what’s happening. I successfully added the node to the cluster group, but I can’t get any resources online. The node tries bringing it online, then shows a failure and immidiately moves them over to the next node. There the resource is being successfully moved online .. So again, I’m out of ideas .. Already tried reinstalling the box, after that I could get the third node successfully into the cluster, without the " Advanced (minimum)" trick … 🤷 still ain’t bringing any resources online.

August 16, 2014 · 1 min · 96 words · christian

SUSE Linux Enterprise Server 10 on VMware ESX

We’re currently having a really weird problem with our VM’s. Sometime last week, SUSE released a kernel update. Now, once you install it and you reboot the selected VM with a DVD/CD image present, you’re gonna see this: msg.vmxaiomgr.retrycontabort.unkown The only workaround so far has been to unmount any cleanse any CD-Drives attached to the VM. And yes, this is reproduceable, even reinstalling from scratch doesn’t change the fact, that after installing the patch the VM quits working. ...

August 16, 2014 · 1 min · 133 words · christian

SUSE Linux Enterprise Server 10 on VMware ESX (continued)

Well, after some searching today (we applied the VMware Update 2 today, thus the VMware Tools update too), I finally found out what is causing that problem. Though the problem seems to be not limited to virtual systems alone, I just browsed through this Novell Forum thread which pretty much describes my problem. I found the same error in the VM’s I tried to mount a CD image. ...

August 16, 2014 · 1 min · 149 words · christian

More VirtualCenter troubles

Well, after my co-worker switched the VirtualCenter certificates with one produced by our RA a few days ago, I can’t clone anything using a customization specification anymore. Unable to decrypt passwords in customization specification Guess, we’re shit outa luck. At least both of those linked VMTN discussions don’t contain any (that is for us) workable solution (well besides storing the password in cleartext in the spec – which ain’t sooo good). Gonna bug him tomorrow to open up a VMware support request, maybe that’ll help somewhat. I sure hope so.

August 16, 2014 · 1 min · 90 words · christian

patch2mail for SLES10

Well, there is this “nifty” tool called patch2mail, which basically converts the XML for the updates to a more readable format. But you’re screwed if you want to do the same on SLES10. Since it ain’t shipping with the zypper xml wrapper thing, you need to do it a bit different. So I ended up writing a small (and yet, ugly) shell script to generate me a mail of my liking .. ...

August 8, 2014 · 2 min · 302 words · christian

MessPC Ethernetbox 2 and Nagios

As I talked to Tobi yesterday, we came to talk about our Ethernet Box thermometer. It’s a neat device, which works pretty much out of the box. Integrating it with Nagios is a bit of a bummer. That’s what the ~300 EUR box looks like. It’s basically a small black box with a RJ45 jack, and four RJ11 jacks for attached external devices. The box itself only functions as a " management station" and doesn’t come with a sensor. Normally, you can attach up till four RJ11 sensors to it. But, MessPC also has RJ11 port splitters, which enables you to attach up to eight RJ11 sensors to the MessPC. As you can see, the box has a RJ45 jack on the other side, which you basically hook up to your network and then configure an IP address (or if you fancy DHCP for those things, it’s possible too). On the opposite site, are the RJ11 jacks for the sensors. As you can see, we currently do have 4 splitters attachted to the box, enabling up till 8 sensors to be measured. Once you have it up and running, you can look at the web interface and you’ll be able to see the state of the sensors right on the first page.

August 8, 2014 · 3 min · 480 words · christian

Linux-HA and Tivoli Storage Manager

Well, since we received part of our shipment on Wednesday, I finally looked at how we’re gonna deploy our active/active Tivoli Storage Manager configuration. Right now, we do have a single pSeries box hosting ~100 client nodes which we’re looking to split by two (since we do have two x366 for that purpose now). Now, as there ain’t no solution for this scenario yet (neither from International Business Machines nor someone out of the open source community), I sat down and started writing an OCF Resource agent for dsmserv (that is the Tivoli Storage Manager server). ...

August 8, 2014 · 13 min · 2746 words · christian