SLES10: zypper-log

Well, I just stumbled upon something .. My Nagios at work wasn’t working anymore, and I went looking. 1 2 3 4 5 6 7 8 9 10 11 nagios3 ~ [0] > tail -f /var/log/nagios/nagios.log [1238658394] Error: Unable to save status file: No space left on device [1238658403] Error: Unable to save status file: No space left on device [1238658413] Error: Unable to save status file: No space left on device [1238658423] SERVICE ALERT: tsm1;POWER WARN;OK;SOFT;4;-u OK - 0 [1238658423] Error: Unable to save status file: No space left on device [1238658433] SERVICE ALERT: tsm2;LOAD;WARNING;SOFT;1;WARNING - load average: 6.25, 5.72, 5.36 [1238658433] Error: Unable to save status file: No space left on device [1238658443] Error: Unable to save status file: No space left on device [1238658453] Error: Unable to save status file: No space left on device [1238658463] Error: Unable After that, zip - nada. Next thing, check whether or not the device is really full … Okay, df .. ...

April 3, 2009 · 3 min · 466 words · christian

Nagios: SNMP OID's for IBM's RSA II adapter

Well, after some poking around I finally found some OID’s for the RSA’s (only through these two links: check_rsa_fan and check_rsa_temp). For Nagios, I dismissed the fans, since the fan speed is only passed on in percent values. So I only added this: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 define hostgroup{ hostgroup_name rsa-snmp alias Remote Supervisor Adapter (allowing SNMP connections) } define service{ use generic-perfdata check_command check_rsa_snmpv1_public!.1.3.6.1.4.1.2.3.51.1.2.1.2.1.1!45!60!°C!Temperature CPU0! hostgroup_name rsa-snmp service_description TEMP CPU0 } define service{ use generic-perfdata check_command check_rsa_snmpv1_public!.1.3.6.1.4.1.2.3.51.1.2.1.2.2.1!45!60!°C!Temperature CPU1! hostgroup_name rsa-snmp service_description TEMP CPU1 } define service{ use generic-perfdata check_command check_rsa_snmpv1_public!.1.3.6.1.4.1.2.3.51.1.2.1.5.1.0!29!35!°C!Temperature Ambient! hostgroup_name rsa-snmp service_description TEMP AMBIENT } Oh, and if anyone else is curious like me, here’s the list with the OID’s, courtesy of Gerhard Gschlad and Leonardo Calamai. ...

April 1, 2009 · 2 min · 235 words · christian

RPM spec: Installing a custom init-script

Well, I’m sitting again here grinding my head on how to fix up a certain package. Now, I had to look it up again, so this time I’m writing it down! 1 2 3 Source1: ${name}.initd ... install -o root -g root -m 755 %{S:1} $RPM_BUILD_ROOT/etc/init.d/ndo2db

March 26, 2009 · 1 min · 46 words · christian

Windows: Running msconfig as non privileged user

Well, the title is kinda misleading since you need administrator privileges to run msconfig in it’s full scope. But this is just a hint to myself on how to execute msconfig without logging out and then logging in as administrator. 1 runas /user:Administrator C:WINDOWSpchealthhelpctrbinariesmsconfig.exe

March 25, 2009 · 1 min · 44 words · christian

Nagios: Watching Clustered environments (the other way)

Well, recently I stepped up to watch our cluster environments … Michael has a good howto on how to watch Windows Cluster environments in the NSclient++ wiki. Now, this has it’s own perks … Which I stumbled upon when trying to write a Linux-HA OCF resource agent for the Nagios NRPE server. Combining that Linux-HA with SLES10 is a good thing generally, but using startproc in that resource agent is not such a good idea. Apparently Novell (or SuSE GmbH) thought it might be wise to include some additional logic into the wrapper. startproc, checkproc and killproc do check for the name of the executable. So if you try to start an additional process with the same name, you need to dig a bit deeper. For this to work, you need two additional things (quotations directly from man 8 startproc): -p pid_file (Former option -f changed due to the LSB specification.) Use an alternate pid file instead of the default (/var/run/ .pid). The pid read from this file is being matched against the pid of running processes that have an executable with specified path of the program. In order to avoid confusion with stale pid files, a not up-to-date pid will be ignored. Now, then apparently this isn’t enough. startproc is still refusing to start a second process. -i ignore_file The pid found in this file is used as session id of the same binary program which should be ignored by startproc.

March 19, 2009 · 3 min · 452 words · christian

Linux-HA: Creating a random authkey

I just looked over the slides of a presentation one of my trainees bought back from Chemnitz, and there was this nifty one-line command, with which you can generate a random sha1sum for your authkeys file. Now, since I’m a bit lazy here’s the full command line to fill /etc/ha.d/authkeys for you: 1 2 node2 ~ [0] > echo "auth 1 1 sha1 $( dd if=/dev/urandom count=4 2> /dev/null | openssl dgst -sha1 )"

March 18, 2009 · 1 min · 74 words · christian

TSM client: Backing up files with umlauts on SLES

In the past, I always had problems with SLES and our Tivoli Storage Manager client’s when backing up files with german umlauts. Well, today I looked a bit harder, and quite quickly found a solution. 1 2 sles9 root [0] > env | grep ^LC LC_CTYPE=de_DE.UTF-8 As you can see from the above, SLES9/10 ain’t setting LANG or LC_ALL (which I searched for first), but is setting LC_CTYPE. So, simply changing the LC_CTYPE in the init-script and/or prepending the dsmc command line with a new LC_CTYPE fixes my umlauts problems! ...

March 2, 2009 · 2 min · 387 words · christian

Nagios: check_snmp again

Well, today I had to grind my head again, regarding the way check_snmp handles WARNING and CRITICAL events. From my point of view, check_snmp is really just retarded sometimes. As you know, all the other plugins accept WARNING and CRITICAL-thresholds based on the calculation, if the return integer is above this threshold it reached WARNING/CRITICAL state. But check_snmp doesn’t play that way. It expects only ranges, which are NOT gonna result in warning or critical events. Which is kinda stupid, since you gotta rethink twice about the thresholds 😛 ...

February 27, 2009 · 1 min · 135 words · christian

Nagios: NSclient++ in a clustered Environment

Well, most of you already know that I’m a Nagios fanatic. I like to watch as many aspects as I possibly can. So, yesterday I started figuring out ways to watch our different cluster groups (housing a bunch – try above 20.000 – of file shares). Now, my first tries failed horribly. I brought down a complete cluster group, resulting in a major annoyance. Now, today I went at it a bit smarter 😛 I cloned myself two VM’s off my Windows Server 2003 Enterprise R2 template, created a new cluster. ...

February 26, 2009 · 2 min · 258 words · christian

MySQL: Beware of sync_binlog on EXT3

Well, I just glazed again over my my.cnf for our web-cluster because I just moved a database from one cluster to another and getting quite different performance from it. So, as I expected, there is a slight difference between both configuration files: 1 2 3 4 5 @@ -55,8 +58,6 @@ innodb_log_group_home_dir = /var/lib/mysql/db innodb_log_file_size = 512M innodb_thread_concurrency = 8 -sync_binlog = 1 And apparently, according to the MySQL Performance Blog that’s really, really bad (as well, we’re currently running without write caching, as the battery module of the storage is dead).

February 23, 2009 · 1 min · 92 words · christian