Well, I just stumbled upon something .. My Nagios at work wasn’t working anymore, and I went looking.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
nagios3 ~ [0] > tail -f /var/log/nagios/nagios.log
[1238658394] Error: Unable to save status file: No space left on device
[1238658403] Error: Unable to save status file: No space left on device
[1238658413] Error: Unable to save status file: No space left on device
[1238658423] SERVICE ALERT: tsm1;POWER WARN;OK;SOFT;4;-u OK - 0
[1238658423] Error: Unable to save status file: No space left on device
[1238658433] SERVICE ALERT: tsm2;LOAD;WARNING;SOFT;1;WARNING - load average: 6.25, 5.72, 5.36
[1238658433] Error: Unable to save status file: No space left on device
[1238658443] Error: Unable to save status file: No space left on device
[1238658453] Error: Unable to save status file: No space left on device
[1238658463] Error: Unable

After that, zip - nada. Next thing, check whether or not the device is really full … Okay, df ..

1
2
3
4
5
nagios3 ~ [130] > df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda2             3.5G  1.2G  2.1G  37% /
udev                  506M   88K  506M   1% /dev
/dev/sdb1             7.9G  7.7G     0 100% /var

So, it is actually completely filled up. So, now we need to find who’s hogging the space. Since I had a assumption (pnp4nagios), I went straight for /var/lib …

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
nagios3 lib [0] > du -sh *
16K     CAM
1.1M    YaST2
8.0K    acpi
4.0K    apache2
28K     autoinstall
16K     dhcpcd
4.0K    empty
96K     hardware
4.0K    logrotate.status
8.0K    misc
78M     mysql
2.1M    nagios
4.0K    net-snmp
4.0K    news
24K     nfs
8.0K    nobody
36K     ntp
4.0K    pam_devperm
824K    php5
359M    pnp4nagios
22M     rpm
28K     scpm
4.0K    smpppd
4.0K    sshd
4.0K    support
8.0K    suseRegister
4.0K    uniconf
4.0K    update-messages
4.0K    wwwrun
33M     zmd
14M     zypp

That wasn’t it .. so heading to the next place, that’s suspicious most of the time, /var/log.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
nagios3 log [0] > du -sh *
5.2G    YaST2
4.0K    acpid
1.4G    apache2
28K     boot.msg
28K     boot.omsg
4.0K    cups
4.0K    dsmerror.log
148K    dsmsched.log
4.0K    faillog
4.0K    krb5
12K     lastlog
4.0K    localmessages
16K     mail
16K     mail.info
198M    messages
0       mysqld.log
14M     nagios
0       ntp
4.0K    pnp4nagios
4.0K    sa
8.0K    scpm
4.0K    vmdesched.log
16K     vmware-imc
4.0K    vmware-tools-guestd
82M     warn
348K    wtmp
115M    zmd-backend.log
24M     zmd-messages.log

I was like “WTF ? 5.2G for YaST2 logs ?” when I initially saw that output … As of now, I got a crontab emptying /var/log/YaST2 every 24 hours …