Home > C6100, Dell, nagios > HOWTO: Monitoring Dell C6100 IPMI with Nagios.

HOWTO: Monitoring Dell C6100 IPMI with Nagios.

Recently, I’ve been working with Nagios for network monitoring.  I have to admit, I came in rather biased, and was frustrated with it.  My frustrations will best be covered in another post.  In my home lab, however, I decided that I was going to make Nagios sing.  This is the first HOWTO I’m doing, although really the first one should have been installing the VM and getting things running.  I’ll do that one soon.

While this HOWTO is going to seem very long, once you get used to how to configure the basics, this all has a very nice rhythm to it.  Is it better or worse than other monitoring apps?  Maybe.  But it is what it is – and I don’t think it’s that bad!

GOAL:   Monitor IPMI (eg: SuperMicro IPMI, Dell C6100 Series IPMI, Dell IDRAC, IBM RMU, etc) via Nagios.

1) Find your Nagios plugin of choice.  I did this by searching the Nagios Plugin Directory for Popularity.  This brought me to WFISCHER’s IPMI Sensor Monitoring Plugin  (http://exchange.nagios.org/directory/Plugins/Hardware/Server-Hardware/IPMI-Sensor-Monitoring-Plugin/details).

clip_image002

Click on the DOWNLOAD URL, and save the file somewhere – like your DOWNLOADS folder:

clip_image004

Next, unpack the file – it’s a TAR.GZ, with a TAR inside.  So use 7Zip or something.

clip_image006

Open the TAR:

clip_image008

Extract this somewhere, such as D:\TEMP2:

clip_image010

2) Open NagiosXI and login.

clip_image012

Click on CONFIGURE on the top and then CORE CONFIG MANAGER on the left.

clip_image014

Click on MONITORING PLUGINS

clip_image016

Click BROWSE, locate the file “check_ipmi_sensor” in the folder above.

clip_image018

Then click UPLOAD PLUGIN.

clip_image020

The plugin is now showing as installed.

In the table in the bottom half of the window, confirm the file is present:

clip_image022

Again click CONFIGURE, and CORE CONFIG MANAGER.  Then click APPLY SETTINGS so the uploaded file is now part of the configuration.

clip_image024

clip_image026

clip_image028

There we go, Nagios Core now knows about the config file.

3) Before we go any further, let’s take a look at the README file that came with the package:

clip_image030

Aha!  Requirements!

The Nagios VM we downloaded is CentOS 6 based.  So use PuTTY and SSH to the host – 10.0.0.150 in my case, and login as “root”, default password “nagiosxi” (you really should change this)

Let’s get FreeIPMI installed.  Run “yum install freeipmi”:

clip_image032

In my case, I already have it installed.  If it were not installed, it would say that it found the package and ask “Do you wish to install: Y/N” and you would answer yes.

Next, let’s get Perl IPC::Run installed.  Run “yum install perl-IPC-Run”:

clip_image034

Same applies here.

NOTE: You may wish to do a general “yum update” and let it update all currently installed packages.  That’s up to you, YMMV and if you break it, you bought it.

So now we have our pre-requisites installed.

4) Let’s test the plugin from the command line.  Run “cd /usr/local/nagios/libexec”:

clip_image036

Okay, so the plugin IS in the plugins folder!  Good.

Now run “./check_ipmi_sensor”:

clip_image038

Guess we’ll need to feed it some parameters.  On my C6100, IPMI user is “root” and default password of “root”.  (yeah, you should change that too).  The priv level is USER or ROOT or something else, but USER is sufficient for read.  You may want to create an IPMIuser account vs ROOT, choice is yours.    My 4 C6100 nodes IPMI IP addresses are 10.0.0.241-244.

So run “./check_ipmi_sensor -H 10.0.0.241 -U root -P root –L user”

clip_image040

Look at that.  It’s practically magic.

Now, we know that the pre-requisites are working and that the check command works from the Linux command line.  So if it doesn’t work from here – it’s a Nagios problem!

5) Let’s start by creating a HOSTGROUP.  HOSTGROUPS are used to group hosts together (like that?) so that you can manage them by group vs individually.  The nice thing about this is say you decide to add a sensor – do you want to add it to 50 devices or 1 host group?  I thought so.

Click on CONFIGURE, CORE CONFIG MANAGER.  On the left under MONITORING, click HOST GROUP:

clip_image042

Here you can see the default host groups.  We’re going to click ADD NEW.

clip_image044

We’re just going to give it a HOSTGROUP NAME and a DESCRIPTION.  Note that on the left, we could MANAGE HOSTS and MANAGE HOSTGROUPS – but because we’re starting here, we have none of either.  But Nagios is chicken-egg.  We could add 40 hosts, then add a hostgroup, then when creating the hostgroup, add the 40 hosts to the hostgroup.  Make sure that ACTIVE box is checked.  Click SAVE.

clip_image046

And as it says, click APPLY CONFIGURATION to make the changes take effect.

clip_image048clip_image050

Alright, now let’s go get some hosts!

6) Let’s configure us some Hosts and Services.

Click on CONFIGURE, CORE CONFIG MANAGER.  On the left under MONITORING, click HOSTS:

clip_image052

Here you can see I’ve already configured two of the hosts.  I’m going to configure the 3rd to show how this looks.

Click ADD NEW.

clip_image054

Enter a HOSTNAME (logical, not actual), ADDRESS (I’m using IP Address as I realized I haven’t set up the IP’s with DNS names yet, my bad), and DISPLAY NAME (probably best to use the same as HOSTNAME – whatever standard makes you happy).

Ensure that ACTIVE on the right is checked.    Now, if you’re familiar with Nagios at all (mostly just a little), you’ll think “But….. what about the CHECK COMMAND?  We need a check command!”.  No, we don’t.  Remember, we’re going to add all the services we want to monitor to the HOST GROUP!

Click on the CHECK SETTINGS tab:

clip_image056

Ensure that CHECK INTERVAL is set to something such as 5 minutes, RETRY INTERVAL (such as when it fails the first check) something like 1 minute, and MAX CHECK ATTEMPS = 3-5 – whatever keeps you happy.  If this is empty, then later on you’ll get an error.

Click SAVE.

clip_image058

You’ll see that the DATABASE ENTRY was successfully updated.  But the SYNC STATUS is SYNC MISSED.  We need to APPLY CONFIGURATION – but let’s not do that just yet.    Click on the clip_image060 icon to configure the service again.

clip_image062

This time, let’s click on MANAGE HOSTGROUPS.

clip_image064

On the left, under HOSTGROUPS, find the previously created HOSTGROUP “server-hardware” and click ADD SELECTED.  Then click CLOSE.  Then click SAVE.

We’ve now added the HOST to the HOSTGROUP.  We’re not going to configure anything individually on the HOST, we’re going to do it all by HOSTGROUPS.

clip_image066

Here you can see the SYNC MISSED for all 3 hosts, as I’ve added them all to the HOSTGROUP behind the scenes.

Click APPLY CONFIGURATION.

7) Next, in Nagios, click on HOME -> QUICK FIND and enter a substring of “NW-ESX”:

clip_image068

You’ll see a suggestion list pop up, but just click GO:

clip_image070

So what you see here on the first two is that I set them up previously WITH a check command on the host for PING.  Ignore this.  But what you see is that the two new ones I’ve added show PENDING.  And they’ll never get beyond PENDING, as there is no check.

8) Click on ADMIN -> CORE CONFIG MANAGER -> COMMANDS -> COMMANDS:

clip_image072

clip_image074

Here I HAVE already configured the command, but let’s click ADD NEW to simulate what it would look like.

clip_image076

<snip>

clip_image078

So here we want to:

Enter the COMMAND NAME.  This is the same command you ran at the command line – “check_ipmi_sensor”.  Note that sometimes this might have an extention, such as “check_ipmi_sensor.sh” or “check_ipmi_sensor.pl”, etc.  Ours does NOT.

On the commandline enter “$USER1$/check_ipmi_sensor” – this is always going to be the case.  $USER1$ is the plug in folder.  Same rules apply about watching for an extention to the file.

The other parameters should look familiar based on the command line.   –U –P –L relate to USER/PASS/LEVEL.  Click SAVE.

Click APPLY CONFIGURATION.

9) Click on CONFIGURE -> CORE CONFIG MANAGER -> MONITORING -> SERVICES:

clip_image080

No services are defined.  So let’s click ADD NEW.

clip_image082

Enter the CONFIGNAME and DESCRIPTION.  I don’t know that either of these really matters, but I’ve chosen to name them the same as the command.  Enter a DISPLAY NAME, this is what you’ll see in the HOSTS/SERVICES list.

Change the CHECK COMMAND to “check_ipmi_sensor” from the list and check ACTIVE.  You’ll note the COMMAND VIEW shows the same details we entered in the previous COMMAND configuration.  I made a mistake and used ARG2/ARG3/ARG4 thinking HOSTNAME was ARG1, but it doesn’t matter.  As long as the variables you put into the ARG’s match their place in the command line.

Click TEST CHECK COMMAND:

clip_image084

Enter an IP address of a sample host, and click OK

clip_image086

Looks like what we got a the command line.  Nice.  Click CLOSE.

Now click MANAGE HOSTGROUPS:

clip_image088

clip_image090

Click the “server-hardware” hostgroup, click ADD SELECTED and click CLOSE.

Click on the CHECK SETTINGS tab:

clip_image092

Same as for hosts, ensure CHECK INTERVAL, RETRY INTERVAL and MAX CHECK ATTEMPTS are filled in.  Click SAVE.

clip_image094

Can you guess what we do now?  Click APPLY CONFIGURATION.

10) If you go back to the Nagios window (I keep a NAGIOS and a NAGIOS ADMIN tab open), and click HOME -> QUICK FIND, enter “NW-ESX” and click GO:

clip_image096

You see all 4 of our hosts suddenly have a service!  And they’re all pending.   Given a little bit of time, they’ll start to check:

clip_image098

Click on the NW-ESXI01-IDRAC CHECK_IPMI_SENSOR service that shows IPMI STATUS: OK

clip_image100

Well that’s boring.  I was hoping for more detail.  Maybe click PERFORMANCE GRAPHS:

clip_image102

(I had to change the zoom level to get more detail on the screen).

Oh would you look at that.  So our one sensor is multi-channeled.  We get all our sensors in one polling.  It also creates a chart for each of them.  That’s pretty handy, so we can now trend our fan/temp/etc.

So what we have done so far is:

· Upload a new plugin.

· Install plugin dependencies

· Test the plugin at the command line to verify it works outside of Nagios

· Create a hostgroup

· Create hosts and add them to a hostgroup

· Create a command from the plugin

· Create a service tied to the command

· Add the service to a hostgroup – which automatically adds them to all hosts in the hostgroup.

· Verified that the hosts individual sensors show all the sensors not just one, and are logging all the historical detail.

To further demonstrate how hostgroups and services work, let’s add another service – just a basic PING service.

11) Click on CONFIGURE -> CORE CONFIG MANAGER -> MONITORING -> SERVICES:

clip_image104

Click ADD NEW

clip_image106

Change the CHECK COMMAND to “check_xi_host_ping”, which is pre-defined.  Check ACTIVE.   Note that the command wants ARG1-ARG4.  These are just timeouts for “-w” (warning level) and “-c” (critical) level.  Let’s say that 3,5 and 10,20 (ms response) indicates those levels.  Enter the CONFIG NAME, DESCRIPTION (which I again make match the CHECK COMMAND) and then a DISPLAY NAME.

Click TEST CHECK COMMAND:

clip_image108

Click OK:

clip_image110

Looks good here.  Click CLOSE.

DON’T CLICK SAVE YET!   If you do, and you haven’t modified the CHECK SETTINGS tab, the APPLY CONFIGURATION will bitch J  Click CHECK SETTINGS tab:

clip_image112

Again, make sure that CHECK INTERVAL=5, RETRY INTERVAL=1 and MAX CHECK ATTEMPTS=3.  Note that INITIAL STATE can be set to W(arning), C(ritical), O(perational) or U(nknown).  Might want to set that to O.

Click back on COMMON SETTINGS.

clip_image114

Click on MANAGE HOSTGROUPS.

clip_image116

Click on the “*” and click ADD SELECTED.  It’s reasonable to assume we want a PING sensor on EVERY HostGroup, yes?  If you click on the 3 that were listed here only and added them, and then later added a 4th net-new hostgroup, it would not have this PING sensor.  For now, let’s just add it to our SERVER-HARDWARE hostgroup.  Click CLOSE.  NOW click SAVE J

clip_image118

Let’s click APPLY CONFIGURATION.

12) By now you’ll be familiar with: If you go back to the Nagios window (I keep a NAGIOS and a NAGIOS ADMIN tab open), and click HOME -> QUICK FIND, enter “NW-ESX” and click GO:

clip_image120

Look at that.  All the hosts in the hostgroup are now checking Ping as well J

clip_image122

And moments later show all okay.

Here you can see how the SERVICE DESCRIPTION “check_xi_host_ping” works.  IF we go back and change that just to “Ping”:

clip_image124

And then click SAVE, and APPLY, then come back to the HOSTS view:

clip_image126

Ta-da!

I’m going to go through all the same steps, without displaying them, and add a HTTPS sensor, as the IPMI cards are web manageable.  We want to know if the WebUI on them should happen to die.

clip_image128

And look at that.

So as you can see, HOSTGROUPS and SERVICE/SERVICEGROUPS are key to making NAGIOS really sing.  I have NOT touched on ALERTS, CONTACTS, ALERT PERIODS, etc.  For now, let’s worry about if we can get Nagios *monitoring* what we want.

Categories: C6100, Dell, nagios
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: