Home > nagios > HOWTO: Mass configure Nagios for advanced monitoring

HOWTO: Mass configure Nagios for advanced monitoring

As you’re aware, Nagios is a pretty decent and freely available network monitoring tool.  Most people aren’t away of the best way to use it or configure it in bulk.  I have an opportunity to add a entire TEST domain to be monitored (but not alerted on), and this seems like a great time to do up some documentation.

We’re going to make the following assumptions:

  • These are all Windows Server hosts of some sort – 2003/2008/2008R2/2012/2012R2 – maybe 2000, who knows.
  • These can all run the Nagios NSclient++ – which can be distributed via GPO based MSI installation and INI file creation if needed.
  • The systems are generally fairly standardized, with the following standards:
    • Disks are C: OS and E: for Data, with D: for Optical
    • SQL Servers have an H: for System DB, I: for User DB, and L: for Logs
    • IIS Servers exist, and have INETPUB on E:\

To accommodate this we’re going to make use of the HOSTS, HOSTGROUPS, SERVICES, and SERVICEGROUPS  in Nagios.  When we do this, we’ll see the following:

  • Services are created – C:, E:, RAM, CPU, etc.
  • Services are added to ServiceGroups    
    • ServiceGroups would be like SERVER_BASE, SERVER_SQL, and SERVER_IIS
  • Hosts are added for each host
  • Host are added to HostGroups  
    • 3 SQL servers might be added to the SERVER_SQL host group
    • HostGroups would be like SERVER_SQL would contain a base Service Group of SERVER_BASE and SERVER_SQL
  • We’re going to preface these with “TST” to specify these settings are for the TST domain.  We could use a common one across all environments, but then we wouldn’t be able to modify the TST ones in advance to validate changes prior to promoting into Production.

1) Login to NagiosXI as your normal user

2) Click on CONFIGURE on the top, then CORE CONFIG MANAGER on the left, and then login as “nagiosadmin”.

clip_image002

3) On the left, under MONITORING click HOST GROUPS.

clip_image004

Click ADD NEW.

clip_image006

Give your HOSTGROUPNAME and DESCRIPTION.  Here we’re going to create HOSTGROUPS for “TST_SERVER”, “TST_SERVER_DC”, “TST_SERVER_SQL”, etc. 

4) Click SERVICE GROUPS under MONITORING on the left.  Then click ADD NEW.

clip_image008

5) Name the Service Group and give it a description.   Click SAVE.

clip_image010

Repeat this for your other service groups – presumably TST_SERVER_SQL and TST_SERVER_IIS as examples.

clip_image012

Click on the APPLY CONFIGURATION button.

6) Next, we’ll add some Services.  We’ll make the assumption that all previously existing services are added on a 1:1 basis to hosts, and thus aren’t really what we want to use, other than perhaps as a template.  Click on SERVICES under MONITORING on the left.  Search for “CPU” to find an existing CPU sensor if one exists.  Click on it to edit it:

clip_image014

clip_image016

Here you can see how the service is configured – we’re running the “check_xi_service_nsclient” command, using arguments where $ARG1$ is a password hash for nsclient, CPULOAD is the snsor, and “-l 5,85,95” indicates to check 5 minutes average, with a warning at 85%, and critical at 95%.    The most important thing here is that your NSCLIENT hash matches what is on your systems.  Click ABORT.

Click ADD NEW:

clip_image018

clip_image020

Configure the CONFIG NAME to specify that it belongs to TST_SERVER and call it _CPU.  The Description you likely want to be something that will display in a human readable format.  Enter your ARG’s as shown.  Click TEST CHECK COMMAND.

clip_image022

Enter the hostname of a system to check as a test.  Click OK

clip_image024

Verify you get OUTPUT and click close.

Click on the CHECK SETTINGS tab:

clip_image026

Change the INITIAL STATE to U for “UP” – this will let it assume it is good, until it knows otherwise. 

Change the CHECK INTERVAL to 5, RETRY INTERVAL to 1, and MAX CHECK to 5.  Change the CHECK PERIOD to 24×7. 

Click COMMON SETTINGS tab again.

Click MANAGE SERVICEGROUPS:

clip_image028

clip_image030

Click the SERVICE GROUP to add to, and click ADD SELECTED.  You’ll see it show up under ASSIGNED.  Click CLOSE. Click SAVE.

clip_image032

Enter TST_SERVER in the search box.  You should now find your new config.  Click COPY and then EDIT the copy to configure additional services using the “check_xi_service_nsclient” command, with the following options:

  • CPULOAD / -l 5,85,95
  • MEMUSE / -w 80,95
  • USEDDISKSPACE / -l C –w 80 –c 90
  • SERVICESTATE / -d SHOWALL –l VMTOOLS
  • UPTIME

clip_image034

Ensure when copying and modifying that you check the ACTIVE box!

Click MANAGE HOSTGROUPS.

clip_image036

Add your SERVER GROUP and click CLOSE. 

You should now have a number of services:

clip_image038

Click APPLY CONFIGURATION.

7) Click on the left under MONITORING and click HOSTS.

clip_image040

Click ADD NEW:

clip_image042

Enter the HOST NAME/DESCRIPTION/ADDRESS/DISPLAY NAME.  I like to use UPPER CASE for the HOSTNAME and lower case for the FQDN portion.  While you can use shorter names and only use FQDN for the “address” field to find the host, consider a situation where you may have “WSUS.PROD.LOCAL”, “WSUS.TEST.LOCAL”, and “WSUS.DEV.LOCAL” – if you get an alert for “WSUS”, which host is it?   Ensure you select a basic CHECK COMMAND, something like “check_xi_service_ping” or “check-host-alive” for a basic ping service.  Click ACTIVE.  Click on the CHECK SETTINGS tab:

clip_image044

Change the INITIAL STATE to U, CHECK INTERVAL to 5, RETRY INTERVAL to 1, MAX CHECK ATTEMPTS to 1, and change CHECK PERIOD to 24×7.  Click COMMON SETTINGS tab.

Click MANAGE HOSTGROUPS

clip_image046

Find the HOST GROUP and click ADD SELECTED.  Click CLOSE.  Then click SAVE.

8) If you now go back to NAGIOS itself, you can QUICK FIND for “FSRVTST” to find all hosts with this substring:

clip_image048

clip_image050

Here you can see how the two DC’s not only have the TST_SERVER services, but also have the TST_SERVER_DC services.  Note that the ones that are NOT a member of TST_SERVER_DC hostgroup, do not show the services assigned to that HOST GROUP.

So from here what do you do:

To add/modify SERVICES to a HOST GROUP:

  • Add new SERVICES
  • Assign the SERVICES to a HOST GROUP

To add new HOST GROUPS:

  • Create any new HOST GROUPS as needed. 
  • Add new SERVICES
  • Assign the SERVICES to a HOST GROUP

To add new HOSTS:

  • Add new hosts, in bulk, and add them to the HOST GROUP – and the services and settings are all done. 

You can see how now that these templates are created, it would be very simple to create monitoring by policy.  Suppose you want to change the Warning/Critical limits from 80/90% on disk space to 85/95 – you now change the service assigned to the HOST GROUP, and you’re finished – whether it is 1 or 100 hosts. 

In order for this all to work though, you MUST HAVE STANDARDS.  For example, I noted that servers in this environment have a C: and E: drive, and D: is optical.  So we can see one of our servers has a “code 139 out of bounds” on E: drive:

clip_image052

clip_image054

When we check the server, we see very clearly, what should be E:, is D:.  One could suggest you simply modify Nagios.  However, the CORRECT plan of action would be to FIX the DEVIATION.  If you do not, other scripts, assumptions, monitoring, tools, etc will ALSO be incorrect.  So if nothing else, you may utilize this method of monitoring to help you locate deficiencies. 

Advertisements
Categories: nagios
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: