Home > nagios > HOWTO: Audit NAGIOS for inclusion of all appropriate hosts

HOWTO: Audit NAGIOS for inclusion of all appropriate hosts

From time to time, Nagios (or any monitoring software) will become out of sync with installed systems.  The automated method of resolution for this is periodic network scans.  However, in many networks, if not designed to accommodate this, will generate many false positives (eg: secondary IP’s, logical host names, DNS names without reverse DNS, cluster/heartbeat NIC’s, etc.).  The alternative is to do a compare against Nagios and a known list.  This is the method I will describe here.

1) Login to Nagios via SSH on SERVERNMS1.  Note you must use a non-root account first, and then “su –“ to become root after.

2) Change to the /usr/local/Nagios/etc/hosts folder (cd /usr/local/nagios/etc/hosts)

[root@servernms1 hosts]# cd /usr/local/nagios/etc/hosts
[root@servernms1 hosts]# pwd
/usr/local/nagios/etc/hosts

3) As Nagios is not configured to either see or share SMB (CIFS) files, we will need to get our list via the console.  Perform an “ls –la” to get the full list of files.  Then copy the screen text to Excel:

[root@servernms1 hosts]# ls -la
total 1440
drwsrwsr-x 2 apache nagios 20480 Sep 24 12:01 .
drwxrwsr-x 8 apache nagios  4096 Sep 21 08:01 ..
-rw-rw-r– 1 apache nagios   970 Sep 25 13:29 10.32.0.18.cfg
-rw-rw-r– 1 apache nagios  1017 Sep 25 13:29 10.8.0.10.cfg
-rw-rw-r– 1 apache nagios   994 Sep 25 13:29 10.8.0.1.cfg
-rw-rw-r– 1 apache nagios  1240 Sep 25 13:29 barracuda1.cfg
-rw-rw-r– 1 apache nagios  1240 Sep 25 13:29 barracuda2.cfg
-rw-rw-r– 1 apache nagios  1123 Sep 25 13:29 brn-2901-isr.servercorp.ca.cfg
-rw-rw-r– 1 apache nagios  1161 Sep 25 13:29 brn-2960-01.servercorp.ca.cfg
-rw-rw-r– 1 apache nagios  1053 Sep 25 13:29 brn-ups-01.servercorp.ca.cfg
-rw-rw-r– 1 apache nagios  1296 Sep 25 13:29 CAA-ACCESS01.cfg
-rw-rw-r– 1 apache nagios  1298 Sep 25 13:29 CAA-ACCESS02.cfg
-rw-rw-r– 1 apache nagios  1298 Sep 25 13:29 CAA-ACCESS03.cfg
-rw-rw-r– 1 apache nagios  1306 Sep 25 13:29 CAA-ACCESS04.cfg
-rw-rw-r– 1 apache nagios  1207 Sep 25 13:29 CAA-ACCESS05.cfg
-rw-rw-r– 1 apache nagios  1270 Sep 25 13:29 CAA-DIST01.cfg
-rw-rw-r– 1 apache nagios  1315 Sep 25 13:29 CAA-DIST02.cfg
-rw-rw-r– 1 apache nagios  1228 Sep 25 13:29 CAB-ACCESS-02.cfg
-rw-rw-r– 1 apache nagios  1291 Sep 25 13:29 CAC-ACCESS01.cfg

NOTE: We have no interest in the first 4 lines – the console prompt, the “total ###” or the “.” or “..” folders. 

Once pasted into Excel, it should auto break the text into columns:

clip_image001

4) Highlight the column with the names (I in this example).  Click on TEXT TO COLUMNS on the toolbar.

clip_image002

Set the DELIMITER to be “.” (period).  This will break apart <servername> and .CFG.  Note that many of the devices listed are (incorrectly, in my opinion) listed by IP address, so it will in fact mungle those items.

You will be left with a list that looks like:

clip_image003

5) Delete all columns other than I to leave only the hostnames that Nagios is configured for.

6) Get a list of servers that should be present.  There are many ways to do this, querying AD, querying vCenter, performing a Net View, etc.  I personally prefer a “Net View”. 

Run: “net view | find /I “FSRV” >>SERVERS.LST” to get a list output to a text file.

The output will look similar to:

clip_image004

As you can see, we need to remove both the “\\” as well as the “Description” field.   Paste these contents into the XLS file in Column C.

7) Now that the contents are in Excel, we can use Text to Columns again to fix this data:

clip_image005

Highlight Column C: and click TEXT TO COLUMNS.

clip_image006

Choose SPACE as a delimiter as well as OTHER = “\”.  Click FINISH.

clip_image007

Delete columns B, C, E-I or more.  (Leave only Columns A and D).

8) Now that you have the two columns – A= “In Nagios” and B=”On Network”, we can compare them. 

Column C: then needs the formula: =IF(ISERROR(MATCH(B1,A:A, 0)), "No Match", "Match")

This formula basically says “Take the value in B#, and look for it in Column A.  If it is found, enter “Match” and if not, enter “No Match”.  As you can see, we found some immediate “No Match” in our list.  Some, in the case of “SERVER1” are logical references to another host (eg: SERVERSAN1 in this case), and some are in fact simply missing (eg: FSRVCDFAP1).

9) Create a FILTER on ROW 1, and filter COLUMN C for “No Match”

clip_image008

The hosts in COLUMN B are the ones not present in Nagios.

10) In this case, we are particularly auditing for missing Windows Server Hosts – to be part of the HOSTGROUP “File & Print Servers (Windows) (fpservers)”.  (http://servernms1/nagiosxi/includes/components/nagioscore/ui/status.php?hostgroup=fpservers&style=overview)

11) These hosts can then be added to Nagios by means of cloning an existing server in the appropriate HOSTGROUP.  Open Nagios XI Configuration Manager, and select HOSTS.  Find FSRVVDF1 (a random selection from the above list, but it happens to be the top server). 

NOTE: We will be adding FSRVDCFAP1 in this example.

clip_image009

Click on the COPY icon under ACTIONS.

clip_image010

Click on the MODIFY icon next to the new copy.

clip_image011

Modify the HOST NAME and ADDRESS appropriately to reflect the new host SERVERCDFAP1.  Click on MANAGE HOSTGROUPS.

clip_image012

Here you can see that because we have cloned the entry, it already is a member of the hostgroup “fpservers”.  We could otherwise add/remove hostgroups here.  Click CLOSE.

Click MANAGE PARENTS:

clip_image013

This is where you would select a PARENT item.  This is useful for setting all devices of a type or location to have a parent device.  This parent device can then be modified, set offline, taken out for maintenance, and “all child hosts” can then be set for maintenance as well.  Thus, use caution when choosing which original HOST to clone – try to aim for a host that is in the same site.    Click CLOSE.

clip_image014

All “*” fields are required.  So copy the HOST NAME to the DISPLAY NAME.  Ensure you check the box for ACTIVE and click SAVE.

12) You will need to refresh your search, as the new item has changed its name, and your search was for the previous name:

clip_image015

When you do so, you will see the new item as above.  If ACTIVE is red and says NO, you failed to check the box.  Edit it and do so now.

Check the box next to the HOSTNAME and click APPLY CONFIGURATION.

13) When the configuration applies, you will see:

clip_image016

14) Because this document is not intended to be a “HOWTO: Configure Nagios”, I will not be getting into the finer details of services and SERVICE GROUPS.  However if you search for the HOST now, you will see:

clip_image017

You can now add SERVICES as deemed appropriate.

The above steps could just as easily be done with INFRASTRUCTURE list exports, etc. 

NOTE: While this is NOT a general NAGIOS HOWTO, it is important to note the following concepts that we simply are not doing well with Nagios today.

SERVICES should be assigned to a SERVICEGROUP.  

– EG: CPU, MEMORY, UPTIME, C:, E:, etc should all be SERVICES, and they should all be assigned to a SERVICEGROUP of “WINDOWS SERVERS” as this is our minimum standard

– EG: XENAPP CONNECTED USERS, XENAPP DISCONNECTED USERS, etc, should all be SERVICES and should be assigned to a SERVICEGROUP of “XENAPP SERVERS”.

– EG: MSSQL_CONNECTED_USERS, MSSQL_CPU_BUSY, MSSQL_DATABASE_FREE, MSSQL_IO_BUSY, SQL AGENT SERVICE, SQL SERVICE, etc, should all be SERVICES and be assigned to a SERVICEGROUP of “SQL SERVERS”

HOSTS should be assigned to a HOSTGROUP.

– EG: SERVERXA1 and SERVERXA2 should be assigned to perhaps a few groups – “WINDOWS SERVERS” and “XENAPP SERVERS”

– EG: SERVERDB2, SERVERDB3, SERVERINFDB2, etc should be assigned to groups as well – “WINDOW SERVERS” (for the bare minimum standard) and “SQL SERVERS” to capture SQL SERVERS services.

You can see, that with proper use of HOSTGROUPS and SERVICEGROUPS, that adding new servers would be so incredibly easy, almost as to be amazing.   The above method with no services, shows that FOCUS is currently using a method whereby a previous HOST with services is cloned.  As there is no grouping or a link to a “template” this means that hosts/services are non-uniform and cannot be updated by updating a central object – which is both time consuming and prone to error.  This is something we should fix.

Also, there is a need to standardize on lowercase, or uppercase as Nagios is Linux based and thus, very case sensitive.

Advertisements
Categories: nagios
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: