Home > Linux, nagios > HOWTO: Troubleshoot Linux free space issues

HOWTO: Troubleshoot Linux free space issues

As many of us are not Linux administrations (and I’m not admitting to being one myself) it seems prudent to have a quick HOWTO in relation to resolving space issues on a Linux server.  Recently we received a Nagios alert that the / (root) partition on the Nagios server itself was full.   Here are some steps and tools you can use as basic Linux system administration to locate and resolve these issues.

1) Login the to box using PuTTY to start an SSH session – using a NON-root account.  Root is almost always denied direct login.

2) Once logged in, run “su -“ to become root.  That’s a single dash character, and the difference between “su” and “su -“ is that the “-“ indicates “and load all my session variables” such as .profile, etc. 

3) First, let’s find out what is USING space – run “df -h” to find “Disk Free”.  “-h” means “in Human Readable” form – eg: 10G, 8K, 2T, etc.

[root@servernms1 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
19G   17G 1017M  95% /
/dev/sda1              99M   31M   64M  33% /boot
tmpfs                 1.5G     0  1.5G   0% /dev/shm

                So we do in fact see the the “mount” of “/” (on disk /dev/mapper/VolGroup00-LogVol00) is in fact, 95% full.

4) Next, we want to find out where our space is going.  Run a “du / –max-depth=1 -h”.  DU is “Disk Used”, against the “/” folder, “—max-depth=1” means “I don’t care about _details_ about subfolders, just show me a summary of one folder deep”, and again “-h human readable”.  Expect this command to take quite a while to run.  This is effectively running “TreeSizeFree” on the C: drive.

[root@servernms1 ~]# du / –max-depth=1 -h
8.0K    /media
119M    /etc
5.3G    /usr
0       /misc
4.4G    /store
34M     /sbin
26M     /boot
0       /sys
du: cannot read directory `/proc/10309′: No such file or directory
du: cannot read directory `/proc/10310′: No such file or directory
0       /proc
8.0K    /selinux
20K     /mnt
124K    /home
0       /net
8.0K    /srv
23M     /opt
236M    /lib
3.4M    /tmp
6.1G    /var
158M    /root
16K     /lost+found
64K     /dev
114M    /bin
17G     /

You can expect the “cannot read” on some folders – even root doesn’t have access to some system folders.  Still, what we see here is that there are only 3 folders that are in the GB of size.

5) Pick one of those folders, change directory to it, and then re-run the same DU command, only specify the current folder “.” Vs the root folder “/”.  As it is only the current level, you’ll get the detail for the new folder.

[root@servernms1 ~]# cd /var
[root@servernms1 var]# du . –max-depth=1 -h
672K    ./db
8.0K    ./local
140K    ./named
16K     ./ftp
24K     ./empty
20K     ./yp
8.0K    ./nis
8.0K    ./racoon
36K     ./lock
8.0K    ./preserve
8.0K    ./games
2.3G    ./spool
12K     ./account
8.0K    ./tux
8.0K    ./rrdtool
2.3G    ./log
8.0K    ./opt
144M    ./cache
1.4G    ./lib
784K    ./tmp
26M     ./www
388K    ./run
6.1G    .

Again, only 3 folders in the GB.  We know/expect that /var/lib will be large – thing C:\WINDOWS\SYSTEM.  But /var/spool and /var/log should not be.  Likely these are outbound mail files and/or temp/log files.

6) Let’s check on /var/log in the same way:

[root@servernms1 var]# cd /var/log
You have new mail in /var/spool/mail/root
[root@servernms1 log]# du . –max-depth=1 -h
8.0K    ./conman
16K     ./mail
5.8M    ./sa
8.0K    ./ppp
19M     ./audit
32K     ./prelink
16K     ./nagios
24K     ./cups
8.0K    ./vbox
2.1G    ./httpd
28K     ./news
8.0K    ./samba
8.0K    ./squid
8.0K    ./pm
8.0K    ./conman.old
2.3G    .

So the first thing we notice is that we’re told there’s new mail in root’s mailbox, while we run the command.  That’s odd in and of itself, and likely points to an issue where there is in fact a large amount of outbound mail – perhaps that isn’t getting delivered.  Or notices TO root (like an event log) that are not being cleared.

Ignoring that, let’s look at the /var/log/httpd folder:

[root@servernms1 log]# cd /var/log/httpd
[root@servernms1 httpd]# du . –max-depth=1 -h
2.1G    .

                2.1GB in one folder.  Now we just run “ls –lah” to get the list of files:

[root@servernms1 httpd]# ls -lah
total 2.1G
drwx——  2 root root 4.0K Oct  6 04:04 .
drwxr-xr-x 17 root root 4.0K Oct 10 04:03 ..
-rw-r–r–  1 root root 422M Oct 10 15:35 access_log
-rw-r–r–  1 root root 627M Oct  6 04:04 access_log.1
-rw-r–r–  1 root root 235M Sep 29 04:02 access_log.2
-rw-r–r–  1 root root 471M Sep 22 14:50 access_log.3
-rw-r–r–  1 root root 343M Sep 15 04:02 access_log.4
-rw-r–r–  1 root root 380K Oct 10 15:34 error_log
-rw-r–r–  1 root root  24M Oct  6 04:04 error_log.1
-rw-r–r–  1 root root  82K Sep 29 04:02 error_log.2
-rw-r–r–  1 root root 852K Sep 22 14:50 error_log.3
-rw-r–r–  1 root root 6.1M Sep 15 04:03 error_log.4
-rw-r–r–  1 root root    0 Aug 25 04:02 ssl_access_log
-rw-r–r–  1 root root  39K Aug 22 00:06 ssl_access_log.1
-rw-r–r–  1 root root  70K Aug 18 03:56 ssl_access_log.2
-rw-r–r–  1 root root  56K Aug 11 03:54 ssl_access_log.3
-rw-r–r–  1 root root  70K Aug  4 03:54 ssl_access_log.4
-rw-r–r–  1 root root  237 Oct  6 04:04 ssl_error_log
-rw-r–r–  1 root root  237 Sep 29 04:02 ssl_error_log.1
-rw-r–r–  1 root root  237 Sep 22 14:50 ssl_error_log.2
-rw-r–r–  1 root root  711 Sep 22 13:43 ssl_error_log.3
-rw-r–r–  1 root root  237 Sep  8 04:04 ssl_error_log.4
-rw-r–r–  1 root root    0 Aug 25 04:02 ssl_request_log
-rw-r–r–  1 root root  48K Aug 22 00:06 ssl_request_log.1
-rw-r–r–  1 root root  87K Aug 18 03:56 ssl_request_log.2
-rw-r–r–  1 root root  69K Aug 11 03:54 ssl_request_log.3
-rw-r–r–  1 root root  87K Aug  4 03:54 ssl_request_log.4

I’m not sure we care about 100’s of MB of historical access_log.# – only the “access_log” is current.  Let’s get rid of the extra ones. 

[root@servernms1 httpd]# rm -rf access_log.?
You have new mail in /var/spool/mail/root

Again, this crazy “new mail”.  Note – it says it is in “/var/spool/mail/root” – we identified “/var/spool” as a potential problem folder….

Still if we do a “du” again after deleting the files:

[root@servernms1 httpd]# du . –max-depth=1 -h
454M    .

Down from 2.1GB to 454MB.  75% reduction. 

7) Now let’s check on /var/spool:

[root@servernms1 spool]# du . –max-depth=1 -h
8.0K    ./repackage
8.0K    ./lpd
32K     ./anacron
2.3G    ./mail
16K     ./cron
7.4M    ./clientmqueue
20K     ./at
8.0K    ./rwho
16K     ./cups
8.0K    ./vbox
64K     ./news
8.0K    ./samba
8.0K    ./squid
52K     ./mqueue
2.3G    .

Shocking – /var/spool/mail is the only GB folder…..

[root@servernms1 spool]# cd /var/spool/mail
[root@servernms1 mail]# du . –max-depth=1 -h
2.3G    .
[root@servernms1 mail]# ls -lah
total 2.3G
drwxrwxr-x  2 root    mail 4.0K Oct 10 15:35 .
drwxr-xr-x 16 root    root 4.0K May 11  2011 ..
-rw-rw—-  1 focusxi mail    0 Jan 30  2011 focusxi
-rw-rw—-  1 nagios  mail 230K Apr 18 09:53 nagios
-rw——-  1 root    root 2.3G Oct 10 15:35 root
-rw-rw—-  1 rpc     mail    0 Jan  8  2010 rpc
-rw-rw—-  1 zuls    mail    0 Jun 28  2011 zuls

So root has a 2.3GB Mail file. 

8) So let’s try reading and emptying the root mailbox.  Run the command “mail”….

[root@servernms1 mail]# mail
/var/spool/mail/root: File too large.

This, I expected.  Not much we can do here.  We could get into a long and boring HOWTO on how to troubleshoot the file, but unless it is critical, the best thing to do here is simply delete root’s mailbox file and move on.  Once it is deleted, it will start growing again, and you can view it to see the mail inbound and deal with the source of the issue directly.  Alternatively, you can “cat” and “grep” the file itself, as it is structured text, but expect it to take forever on a 2GB+ file.

9) Check to ensure you now have enough free space, with the “df” command:

[root@servernms1 mail]# df / -h
Filesystem            Size  Used Avail Use% Mounted on
                       19G   13G  4.9G  72% /

Categories: Linux, nagios
  1. mustapha
    November 14, 2014 at 2:37 PM

    Thank you very much. You saved me .

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: