Archive

Archive for the ‘C6100’ Category

C6100 IPMI Issues with vSphere 6

July 15, 2015 Leave a comment

So I’m not 100% certain if the issues I’m having on my C6100 server are vSphere 6 related or not.  But I have seen similar issues before in my lab, so it may be one of a few things.

After a recent upgrade, I noted that some of my VM’s seemed “slow” – which is hard to quantify.  Then this morning I wake up to having internet but no DNS, so I know my DC is down.  Hosts are up though.  So I give them a hard boot, connect to the IPMI KVM, and watch the startup.  To see “loading IPMI_SI_SRV…” and it just sitting there.

In the past, this seemed to be related to a failing SATA disk, and the solution was to pop it up – which helped temporarily until I replaced the disk outright.  But these are new drives.  Trying the same here did not work, though I only tried the spinning disks and not the SSD’s.  Rather than mess around, I thought I’d find a way to see if I could disable IPMI at least to troubleshoot.

Turns out, I wasn’t alone – though just not specific to vSphere 6:

https://communities.vmware.com/message/2333989

http://www.itblah.com/installing-or-upgrading-to-vmware-vsphere-hypervisor-5-esxi-5-using-the-interactive-method/

https://xuri.me/2014/12/06/avoid-vmware-esxi-loading-module-ipmi_si_drv.html

That last one is the option I took:

  • Press SHIFT+O during the Hypervisor startup
  • Append “noipmiEnabled” to the boot args

Which got my hosts up and running. 

I haven’t done any deeper troubleshooting, nor have I permanently disabled the IPMI with the options of:

Manually turn off or remove the module by turning the option “VMkernel.Boot.ipmiEnabled” off in vSphere or using the commands below:

# Do a dry run first:
esxcli software vib remove –dry-run —vibname ipmi–ipmi–si–drv
# Remove the module:
esxcli software vib remove —vibname ipmi–ipmi–si–drv

We’ll see what comes when I get more time…

Advertisements
Categories: C6100, ESXi, Home Lab, vSphere

Modifying the Dell C6100 for 10GbE Mezz Cards

June 11, 2015 3 comments

In a previous post, Got 10GbE working in the lab – first good results, I talked about getting 10GbE working with my Dell C6100 series.  Recently, a commenter asked me if I had any pictures of the modifications I had to make to the rear panel to make these 10GBE cards work.  As I have another C6100 I recently acquired (yes, I have a problem…), that needs the mods, it seems only prudent to share the steps I took in case it helps someone else.

First a little discussion about what you need:

  • Dell C6100 without the rear panel plate to be removed
  • Dell X53DF/TCK99 2 Port 10GbE Intel 82599 SFP+ Adapter
  • Dell HH4P1 PCI-E Bridge Card

You may find the Mezz card under either part number – it seems that the X53DF replaced the TCK99.  Perhaps one is the P/N and one is the FRU or some such.  But you NEED that little PCI-E bridge card.  It is usually included, but pay special attention to the listing to ensure it does.  What you DON’T really need, is the mesh back plate on the card – you can get it bare. 

2015-06-11 21.18.132015-06-11 21.17.46

Shown above are the 2pt 10GbE SFP+ card in question, and also the 2pt 40GbE Infiniband card.  Above them both is the small PCI-E bridge card.

2015-06-11 21.19.24

You want to remove the two screws to remove the backing plate on the card.  You won’t be needing it, and you can set it aside.  The screws attach through the card and into the bracket, so once removed, reinsert the screws to the bracket to keep from losing them.

2015-06-11 21.17.14

Here we can see the back panel of the C6100 sled.  Ready to go for cutting.

2015-06-11 21.22.232015-06-11 21.24.48

You can place the factory rear plate over the back plate.  Here you can see where you need to line it up and mark the cuts you’ll be doing.  Note that of course the bracket will sit higher up on the unit, so you’ll have to adjust for your horizontal lines. 

2015-06-11 21.23.092015-06-11 21.22.49

If we look to the left, we can see the source of the problem that causes us to have to do this work.  The back panel here is not removable, and wraps around the left corner of the unit.  In systems with the removable plate, this simply unscrews and panel attached to the card slots in.  In the right hand side you can see the two screws that would attach the panel and card in that case.

2015-06-11 21.35.38

Here’s largely what we get once we complete the cuts.  Perhaps you’re better with a Dremel than I am. Note that the vertical cuts can be tough depending on the size of the cutting disk you have, as they may have interference from the bar to remove the sled. 

2015-06-11 21.36.162015-06-11 21.36.202015-06-11 21.36.28

You can now attach the PCI-E bridge card to the Mezz card, and slot it in.  I found it easiest to come at about 20 degree angle and slot in the 2 ports into the cut outs, then drop the PCI-E bridge into the slot.  When it’s all said and done, you’ll find it pretty secure and good to go.

That’s really about it.  Not a whole lot to it, and if you have it all in hand, you’d figure it out pretty quick.  This is largely to help show where my cut lines ended up compared tot he actual cuts and where adjustments could be made to make the cuts tighter if you wanted.  Also, if you’re planning to order, but are not sure if it works or is possible, then this is going to help out quite a bit.

Some potential vendors I’ve had luck with:

http://www.ebay.com/itm/DELL-X53DF-10GbE-DUAL-PORT-MEZZANINE-CARD-TCK99-POWEREDGE-C6100-C6105-C6220-/181751541002? – accepted $60 USD offer.

http://www.ebay.com/itm/DELL-X53DF-DUAL-PORT-10GE-MEZZANINE-TCK99-C6105-C6220-/181751288032?pt=LH_DefaultDomain_0&hash=item2a513890e0 – currently lists for $54 USD, I’m sure you could get them for $50 without too much negotiating.

Categories: C6100, Dell, Hardware, Home Lab

PernixData FVP v1.5 GA on vSphere v5.5 First Look

March 15, 2014 1 comment

So one of my most recent posts was about fixing my UUID issue on my Dell C6100 series server.  Of course, what prompted that initially and identified the problem, was PernixData’s FVP product – way back in the 0.9 Beta if I recall.  Now that I’ve gotten this solved, of course, I wanted to give FVP a try again. 

So out goes some e-mails to PernixData with a request for download (http://www.pernixdata.com/trial/ – go request a trial!  You’ll like it…)  A quick chat with Chris Floyd (@phloider) and Peter Chang (@virtualbacon) gets me set up with the trial again.  However, a quick look says “.. vSphere v5.0 and v5.1…”  Well that’s no good, I’m on v5.5.0 U1 (of course, why not be an early adopter Smile).  So that looks like it’s out of the question.  Then they tell me the new version is supposed to GA on Monday March 17.  Well I can wait that long I figure.  That lasted until about 7PM on Friday, at which point I went to download the beta anyway.

image

Not being up on the current version number (I hadn’t been keeping track, what with the UUID issue, why disappoint myself further that my hardware doesn’t like their software), so I go ahead and download the ‘beta’ figuring I’ll give it a try.  Not 10 minutes later I get an e-mail from Chris with a subject line of “New plans for the weekend…” the body of which stated: “You were the first person to download 1.5 GA. Let me know what you think.”

Well dammit.  I’m not waiting till Monday now Smile 

First, nothing in this post should supersede what’s in the documentation – which is actually really good.  This is my notes version, and cheat sheet.  If you follow my notes and didn’t read their documentation at all – that’s on you.  With that said… let’s begin!

 

1) Install and configure the Management Server

 

I’ve chosen to install this in my lab on my vCenter server using the same svcVMware AD account.  Run PernixData FVP Management Server – 1.5.03869.0.exe and start the installation.

image

This really is the first screen that isn’t “Next, Next, Finish-y”. 

image

I’ve opted to use the same SQL_EXPRESS instance used by my vCenter Server – probably not the best way to go if in Production, but works good enough here.

image

Next we tell the FVP Management Server how it should be found on the network.

image

And then click INSTALL.

image

A JRE?  Yeah, go ahead and install that too if it’s needed.

 

2) Configure FVP

 

Next, you’d normally install the plug in.  The vSphere Client Plug-in for FVP v1.5 is only for vSphere v5.0 or v5.1.  For v5.5 the plug in is installed in the vSphere Web Client – and there’s nothing to do, as the installer added it to vCenter Server. 

image

So log in to the vSphere web client and click on vCenter.  You’ll see a PernixData FVP section at the bottom.  Click on FLASH CLUSTERS.

image

Click CREATE

image

Name your cluster and select the cluster you want to attach it to.  Click OK.

image

Next you’ll see the Getting Started tab.  Click on the MANAGE tab.

image

It will show FLASH DEVICES.  Click ADD DEVICE.  You’ll quickly get prompted that you’re a fool and haven’t installed the software on the hosts.  Duly noted. 

 

3) (should have been 2) Add the FVP Extensions to

the host(s)

 

Installation is either via uploading to the host and installation via SSH or VUM – which is “Experimental” at this state.  However, I would like to see the VUM method work as it is more automated, so let’s give that a try.

image

In the vSphere Client, browse to HOME –> SOLUTIONS –> UPDATE MANAGER.  Click on the PATCH REPOSITORY tab.  Click IMPORT PATCHES.

image

Browse to where you’ve unpacked your FVP v1.5 software, and select the ESXi v5.5 update.  Click NEXT.  You may get prompted to install/accept/ignore a certificate – do so.

image

Click FINISH.

image

I’d never seen the patches not show up right away, but apparently my vCenter was busy.  Watch the RECENT TASKS pane to ensure the patches are Confirm Imported. 

image

Then confirm by entering PERNIX in the search box.

Click on the BASELINES AND GROUPS tab, and click CREATE on the BASELINE side.

image

Name your baseline and select HOST EXTENTION.  Click NEXT.

image

Search for Pernix, click the down arrow to add it to the lower window, and click NEXT.

image

On the READY TO COMPLETE screen, click FINISH.

image

If you have a Baseline Group you may want to add the Extension to your Baseline Group.  Click COMPLIANCE VIEW in the upper right to return to your hosts and clusters view.  Select your cluster and click SCAN to check for updates required.

image

Click REMEDIATE.  Then select only the EXTENTIONS BASELINE and select the PERNIXDATA FVP v1.5 GA baseline.  Check all applicable hosts and click NEXT.

image

Click NEXT, NEXT, then set your remediation options.  I like to disable removable media and set my retries for every 1 minute and 33 retries –largely because it’s easy to type/change with one hand.  Click NEXT.

image

Choose whatever remediation options make you happy and click NEXT and FINISH.  Then wait for the magic to happen.

 

4) NOW configure FVP 🙂

 

Now that you’ve added the extensions, let’s go back adding devices:

image

Only 2 of my 4 hosts are showing up right now – that’s fine.  I’m going to choose to add my Kingston V300 120GB SSD’s (here’s hoping they work and are on the HCL), and click OK.

image

Now that the devices show up, click on DATASTORES/VM’s

image

Next we’ll click ADD DATASTORE.

image

Only one of my datastores is ISCSI, and FVP only accelerates block devices – FCP, FCoE, or ISCSI- no local DAS data stores obviously either.  So select the appropriate ISCSI (in my case in the lab) datastore and caching method (Write Through or Write Back) and click OK.  As I want maximum performance I’m going to choose Write Back.

image

Except when I try that, it tells me all my hosts need to be ready.  So I’ll finish my FVP Extension installations and then retry.  Okay, and there we go Smile

image

Now we can not only select Write Back, but also select the Write Redundancy.  In order for Write Back to be safe, we need to select a mirror/parity for that cache on another host in case of the host with the primary cache failing.  For my lab, HOST+1 is more than enough.

image

Understandably, it will take a little bit of time for VM’s to start caching, and then for that cache to populate on the additional nodes.  Here you can see some VM’s are CONFIGURED for Write Back, but have a current status of Write Through. 

image

If we go click on MONITOR and PERFORMANCE we can start to see some stats on what’s happening.  Note that my lab isn’t very busy, so we shouldn’t expect to see much.

image

We can see the IOPS as well. 

So lets go log into a VM on the datastore and run a benchmark.  I’ll use Atto Bench32 which is what I use for quick and dirty throughput tests.  Note that this is not a good IOPS test, but it does give a decently quick indication as to performance and health.

image

Here you can see some pretty amazing numbers.   At 4.0KB, we’re seeing 2.5x write and 2x read numbers.  By the 16.0KB block size, it’s not even fair any more.  That’s not bad for a couple of $70 SSD’s.

image

But let’s look at what the FVP console gives us.  First we get a wealth of metrics that the vSphere performance monitor alone doesn’t give us.  You can clearly see that the VM was able to observe almost 9000 IOPS – which is nothing to bitch about. 

image

So based on this, I’m pretty happy.  I do have to do more testing, get some tweaks in, and better understand the settings.  But clearly I’m going to be able to push the lab a little harder. 

 

Observations and Conclusion:

 

For my needs, in my lab, speed is critical.  While I’m by no means business centric, “time is money” and the faster the equipment is, the more things I can do, which means the more I can test and the more I can learn.  I already know how to watch progress bars – so anything I can do to reduce that, will maximize my time.

Secondly, this is pretty amazing for the cost of 4x $70 120GB SSD’s.  Would you use this class of consumer grade MLC in Production with FVP?  Probably (hopefully) not.  But you could make an argument to do so, and just treat them like printer toner cartridges and replace them periodically – as long as that period didn’t fail at the worst time or require a large amount of time swapping SSD’s.

Clearly, I’ve sold the C6100 duplicate UUID/Service Tag problem Smile 

I’ll be doing additional testing in a bit.  But after hearing I was the first to download the GA code, I wanted to be the first to get something up about it.  Hopefully this will help someone else get started up quickly and easily. 

It’s late – time for bed.  But this post was a long time coming – damned C6100 UUIDs…

Categories: C6100, ESXi, PernixData, SSD, vSphere

HOWTO: Dell C6100 FRU / UUID Update–FINALLY!

March 13, 2014 Leave a comment

So this post has been a LONG time coming, and I’m pretty sure I’m good to go now.

As you know, the Dell C6100 is a great 4 node in 2U chassis, which works really well for a compact home lab (if you can stand the noise).  vSphere likes it, Hyper-V likes it, what’s to complain about?

Then I tried the beta of PernixData FVP.  It worked as advertised, was a simple installation, did what it was supposed to – kind of.  I noticed that it seemed like only the very last node I rebooted was the one with FVP running on it.  I did some tests, did some more installations, and watched as the next host I rebooted became the only one with the software running. 

So, given it was beta, I reached out to support – and support from PernixData was great.  Given all the troubleshooting I’d done, I gave them all the information I could find: screenshots, logs, processes, steps and sequences.  I’ll be damned if they didn’t come back pretty quickly with a suggestion – I must have duplicate UUID’s on the hosts.  Bullocks I say, ESXi has been happy, no complaints, no worries, whatever do you mean.

Support says “browse to: /mob/?moid=ha-host&doPath=hardware%2esystemInfo">https://<host>/mob/?moid=ha-host&doPath=hardware%2esystemInfo, and confirm the UUID string is different on each host”.  No problem:

NW-ESXI2:

image

NW-ESXI3:

image

Well I’ll be damned –

uuid string "4c4c4544-0038-5410-8030-b4c04f4d4c31"
On all 4 nodes.  Okay so that IS my problem. 

VMware even has a KB on it – http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1006250.  Not that this is a “Whitebox”, but it certainly is an OEM custom, by definition.  So we’ll go with that. 

See, on a C6100 you have a typical Dell Service Tag – eg: ABC123A for the chassis.  But each ‘sled’ has a .# after it.  So you’ll have ABC123A.1, ABC123A.2, ABC123A.3, and ABC123A.4.  Turns out this makes ESXi assign the same UUID.  Some Googling tells me that this is also apparently an issue for SCVMM and SCSM.  As DCS never really intended these systems to end up in “Enterprise” or “Home Lab” hands, but very large cloud providers, there’s no reason to care.  And fairly enough, it didn’t have any impact on my normal vSphere lab. 

Now.  How the heck do you update it?  The BIOS doesn’t give you an option.  Some posts on the internet suggest you could upload a new BIOS and specify it then, but that didn’t work out.  Dell was no help – and I don’t fault them one bit.  The system is used, off warranty, and used by someone it wasn’t intended to be supported by.  That’s fully on me, I have no complaints.  But I still wanted it fixed. Smile

I spend a lot of time at www.servethehome.com and this is a good place for a wealth of C6100 information.  A thread caught my attention where it noted these issues.  One particular post by TehSuk caught my attention – http://forums.servethehome.com/processors-motherboards/1865-smbios-guid-2.html#post23817.  Apparently you can just run the Windows version of IPMIUTIL.exe with the following options:

ipmiutil.exe fru -s %newassettag%

Reboot, and you’re good to go.  No such luck.  See, the user in question notes that he’s a Windows shop.  No such luck with ESXi.  So I tried making a DTK bootable ISO from Dell using some information they had, but that wasn’t working.  Various issues from the methods being written a while back and not supported on Windows 8 (which took me a bit to figure out that was my issue) to the tools having issues with creating a 32bit ISO on a 64bit system due to environment variables, DLL’s not found, etc.  Nothing the end of the world, but I didn’t like that path. 

Then I remembered that you can use IPMIUTIL.exe across a network.  I had no luck when I tried months ago, so why would it work now?   Other than I’ve now spent more time playing with the utility. 

image

Running:

ipmiutil.exe fru –N <hostname/IP> –U <user> –P <password>

Was able to get me a listing which included “Product Serial Num”.  So could I use the same “fru –s %SERNUM%” suggested by TehSuk? 

ipmiutil.exe fru s AAAAAA3 –N <hostname/IP> –U <user> –P <password>

image

Sure enough, it will change “Product Serial Number” to AAAAAA3.  So let’s reboot and find out what it says.

After updating the first 3 nodes, and checking the MOB link, looks like we have success:

NW-ESXI1:

"4c4c4544-0041-4110-8041-c1c04f414131"

NW-ESXI2:

"4c4c4544-0041-4110-8041-c1c04f414132"

NW-ESXI3:

"4c4c4544-0041-4110-8041-c1c04f414133"

NW-ESXI4:

"4c4c4544-0038-5410-8030-b4c04f4d4c31"

No need to change it – leave it with the original Service Tag, as it no longer conflicts. 

 

So in the end, all you’re going to need is:

http://ipmiutil.sourceforge.net/FILES/ipmiutil-2.9.2-win64.zip

And run the above IPMIUTIL.exe FRU commands, and you should be good to go.  I haven’t checked if PernixData FVP now works better for me yet as it’s late – but here’s hoping it does.  If nothing else, the UUID’s are now different, as they should be!

BTW, please don’t read any of this as though I was disappointed with PernixData FVP – heck, if anything they helped me find this issue, pointed me in the right direction, and I wanted their software to work because my testing showed it made an AMAZING difference.   I’m looking forward to retrying the software across all 4 nodes.

HOWTO: Monitoring Dell C6100 IPMI with Nagios.

July 14, 2013 Leave a comment

Recently, I’ve been working with Nagios for network monitoring.  I have to admit, I came in rather biased, and was frustrated with it.  My frustrations will best be covered in another post.  In my home lab, however, I decided that I was going to make Nagios sing.  This is the first HOWTO I’m doing, although really the first one should have been installing the VM and getting things running.  I’ll do that one soon.

While this HOWTO is going to seem very long, once you get used to how to configure the basics, this all has a very nice rhythm to it.  Is it better or worse than other monitoring apps?  Maybe.  But it is what it is – and I don’t think it’s that bad!

GOAL:   Monitor IPMI (eg: SuperMicro IPMI, Dell C6100 Series IPMI, Dell IDRAC, IBM RMU, etc) via Nagios.

1) Find your Nagios plugin of choice.  I did this by searching the Nagios Plugin Directory for Popularity.  This brought me to WFISCHER’s IPMI Sensor Monitoring Plugin  (http://exchange.nagios.org/directory/Plugins/Hardware/Server-Hardware/IPMI-Sensor-Monitoring-Plugin/details).

clip_image002

Click on the DOWNLOAD URL, and save the file somewhere – like your DOWNLOADS folder:

clip_image004

Next, unpack the file – it’s a TAR.GZ, with a TAR inside.  So use 7Zip or something.

clip_image006

Open the TAR:

clip_image008

Extract this somewhere, such as D:\TEMP2:

clip_image010

2) Open NagiosXI and login.

clip_image012

Click on CONFIGURE on the top and then CORE CONFIG MANAGER on the left.

clip_image014

Click on MONITORING PLUGINS

clip_image016

Click BROWSE, locate the file “check_ipmi_sensor” in the folder above.

clip_image018

Then click UPLOAD PLUGIN.

clip_image020

The plugin is now showing as installed.

In the table in the bottom half of the window, confirm the file is present:

clip_image022

Again click CONFIGURE, and CORE CONFIG MANAGER.  Then click APPLY SETTINGS so the uploaded file is now part of the configuration.

clip_image024

clip_image026

clip_image028

There we go, Nagios Core now knows about the config file.

3) Before we go any further, let’s take a look at the README file that came with the package:

clip_image030

Aha!  Requirements!

The Nagios VM we downloaded is CentOS 6 based.  So use PuTTY and SSH to the host – 10.0.0.150 in my case, and login as “root”, default password “nagiosxi” (you really should change this)

Let’s get FreeIPMI installed.  Run “yum install freeipmi”:

clip_image032

In my case, I already have it installed.  If it were not installed, it would say that it found the package and ask “Do you wish to install: Y/N” and you would answer yes.

Next, let’s get Perl IPC::Run installed.  Run “yum install perl-IPC-Run”:

clip_image034

Same applies here.

NOTE: You may wish to do a general “yum update” and let it update all currently installed packages.  That’s up to you, YMMV and if you break it, you bought it.

So now we have our pre-requisites installed.

4) Let’s test the plugin from the command line.  Run “cd /usr/local/nagios/libexec”:

clip_image036

Okay, so the plugin IS in the plugins folder!  Good.

Now run “./check_ipmi_sensor”:

clip_image038

Guess we’ll need to feed it some parameters.  On my C6100, IPMI user is “root” and default password of “root”.  (yeah, you should change that too).  The priv level is USER or ROOT or something else, but USER is sufficient for read.  You may want to create an IPMIuser account vs ROOT, choice is yours.    My 4 C6100 nodes IPMI IP addresses are 10.0.0.241-244.

So run “./check_ipmi_sensor -H 10.0.0.241 -U root -P root –L user”

clip_image040

Look at that.  It’s practically magic.

Now, we know that the pre-requisites are working and that the check command works from the Linux command line.  So if it doesn’t work from here – it’s a Nagios problem!

5) Let’s start by creating a HOSTGROUP.  HOSTGROUPS are used to group hosts together (like that?) so that you can manage them by group vs individually.  The nice thing about this is say you decide to add a sensor – do you want to add it to 50 devices or 1 host group?  I thought so.

Click on CONFIGURE, CORE CONFIG MANAGER.  On the left under MONITORING, click HOST GROUP:

clip_image042

Here you can see the default host groups.  We’re going to click ADD NEW.

clip_image044

We’re just going to give it a HOSTGROUP NAME and a DESCRIPTION.  Note that on the left, we could MANAGE HOSTS and MANAGE HOSTGROUPS – but because we’re starting here, we have none of either.  But Nagios is chicken-egg.  We could add 40 hosts, then add a hostgroup, then when creating the hostgroup, add the 40 hosts to the hostgroup.  Make sure that ACTIVE box is checked.  Click SAVE.

clip_image046

And as it says, click APPLY CONFIGURATION to make the changes take effect.

clip_image048clip_image050

Alright, now let’s go get some hosts!

6) Let’s configure us some Hosts and Services.

Click on CONFIGURE, CORE CONFIG MANAGER.  On the left under MONITORING, click HOSTS:

clip_image052

Here you can see I’ve already configured two of the hosts.  I’m going to configure the 3rd to show how this looks.

Click ADD NEW.

clip_image054

Enter a HOSTNAME (logical, not actual), ADDRESS (I’m using IP Address as I realized I haven’t set up the IP’s with DNS names yet, my bad), and DISPLAY NAME (probably best to use the same as HOSTNAME – whatever standard makes you happy).

Ensure that ACTIVE on the right is checked.    Now, if you’re familiar with Nagios at all (mostly just a little), you’ll think “But….. what about the CHECK COMMAND?  We need a check command!”.  No, we don’t.  Remember, we’re going to add all the services we want to monitor to the HOST GROUP!

Click on the CHECK SETTINGS tab:

clip_image056

Ensure that CHECK INTERVAL is set to something such as 5 minutes, RETRY INTERVAL (such as when it fails the first check) something like 1 minute, and MAX CHECK ATTEMPS = 3-5 – whatever keeps you happy.  If this is empty, then later on you’ll get an error.

Click SAVE.

clip_image058

You’ll see that the DATABASE ENTRY was successfully updated.  But the SYNC STATUS is SYNC MISSED.  We need to APPLY CONFIGURATION – but let’s not do that just yet.    Click on the clip_image060 icon to configure the service again.

clip_image062

This time, let’s click on MANAGE HOSTGROUPS.

clip_image064

On the left, under HOSTGROUPS, find the previously created HOSTGROUP “server-hardware” and click ADD SELECTED.  Then click CLOSE.  Then click SAVE.

We’ve now added the HOST to the HOSTGROUP.  We’re not going to configure anything individually on the HOST, we’re going to do it all by HOSTGROUPS.

clip_image066

Here you can see the SYNC MISSED for all 3 hosts, as I’ve added them all to the HOSTGROUP behind the scenes.

Click APPLY CONFIGURATION.

7) Next, in Nagios, click on HOME -> QUICK FIND and enter a substring of “NW-ESX”:

clip_image068

You’ll see a suggestion list pop up, but just click GO:

clip_image070

So what you see here on the first two is that I set them up previously WITH a check command on the host for PING.  Ignore this.  But what you see is that the two new ones I’ve added show PENDING.  And they’ll never get beyond PENDING, as there is no check.

8) Click on ADMIN -> CORE CONFIG MANAGER -> COMMANDS -> COMMANDS:

clip_image072

clip_image074

Here I HAVE already configured the command, but let’s click ADD NEW to simulate what it would look like.

clip_image076

<snip>

clip_image078

So here we want to:

Enter the COMMAND NAME.  This is the same command you ran at the command line – “check_ipmi_sensor”.  Note that sometimes this might have an extention, such as “check_ipmi_sensor.sh” or “check_ipmi_sensor.pl”, etc.  Ours does NOT.

On the commandline enter “$USER1$/check_ipmi_sensor” – this is always going to be the case.  $USER1$ is the plug in folder.  Same rules apply about watching for an extention to the file.

The other parameters should look familiar based on the command line.   –U –P –L relate to USER/PASS/LEVEL.  Click SAVE.

Click APPLY CONFIGURATION.

9) Click on CONFIGURE -> CORE CONFIG MANAGER -> MONITORING -> SERVICES:

clip_image080

No services are defined.  So let’s click ADD NEW.

clip_image082

Enter the CONFIGNAME and DESCRIPTION.  I don’t know that either of these really matters, but I’ve chosen to name them the same as the command.  Enter a DISPLAY NAME, this is what you’ll see in the HOSTS/SERVICES list.

Change the CHECK COMMAND to “check_ipmi_sensor” from the list and check ACTIVE.  You’ll note the COMMAND VIEW shows the same details we entered in the previous COMMAND configuration.  I made a mistake and used ARG2/ARG3/ARG4 thinking HOSTNAME was ARG1, but it doesn’t matter.  As long as the variables you put into the ARG’s match their place in the command line.

Click TEST CHECK COMMAND:

clip_image084

Enter an IP address of a sample host, and click OK

clip_image086

Looks like what we got a the command line.  Nice.  Click CLOSE.

Now click MANAGE HOSTGROUPS:

clip_image088

clip_image090

Click the “server-hardware” hostgroup, click ADD SELECTED and click CLOSE.

Click on the CHECK SETTINGS tab:

clip_image092

Same as for hosts, ensure CHECK INTERVAL, RETRY INTERVAL and MAX CHECK ATTEMPTS are filled in.  Click SAVE.

clip_image094

Can you guess what we do now?  Click APPLY CONFIGURATION.

10) If you go back to the Nagios window (I keep a NAGIOS and a NAGIOS ADMIN tab open), and click HOME -> QUICK FIND, enter “NW-ESX” and click GO:

clip_image096

You see all 4 of our hosts suddenly have a service!  And they’re all pending.   Given a little bit of time, they’ll start to check:

clip_image098

Click on the NW-ESXI01-IDRAC CHECK_IPMI_SENSOR service that shows IPMI STATUS: OK

clip_image100

Well that’s boring.  I was hoping for more detail.  Maybe click PERFORMANCE GRAPHS:

clip_image102

(I had to change the zoom level to get more detail on the screen).

Oh would you look at that.  So our one sensor is multi-channeled.  We get all our sensors in one polling.  It also creates a chart for each of them.  That’s pretty handy, so we can now trend our fan/temp/etc.

So what we have done so far is:

· Upload a new plugin.

· Install plugin dependencies

· Test the plugin at the command line to verify it works outside of Nagios

· Create a hostgroup

· Create hosts and add them to a hostgroup

· Create a command from the plugin

· Create a service tied to the command

· Add the service to a hostgroup – which automatically adds them to all hosts in the hostgroup.

· Verified that the hosts individual sensors show all the sensors not just one, and are logging all the historical detail.

To further demonstrate how hostgroups and services work, let’s add another service – just a basic PING service.

11) Click on CONFIGURE -> CORE CONFIG MANAGER -> MONITORING -> SERVICES:

clip_image104

Click ADD NEW

clip_image106

Change the CHECK COMMAND to “check_xi_host_ping”, which is pre-defined.  Check ACTIVE.   Note that the command wants ARG1-ARG4.  These are just timeouts for “-w” (warning level) and “-c” (critical) level.  Let’s say that 3,5 and 10,20 (ms response) indicates those levels.  Enter the CONFIG NAME, DESCRIPTION (which I again make match the CHECK COMMAND) and then a DISPLAY NAME.

Click TEST CHECK COMMAND:

clip_image108

Click OK:

clip_image110

Looks good here.  Click CLOSE.

DON’T CLICK SAVE YET!   If you do, and you haven’t modified the CHECK SETTINGS tab, the APPLY CONFIGURATION will bitch J  Click CHECK SETTINGS tab:

clip_image112

Again, make sure that CHECK INTERVAL=5, RETRY INTERVAL=1 and MAX CHECK ATTEMPTS=3.  Note that INITIAL STATE can be set to W(arning), C(ritical), O(perational) or U(nknown).  Might want to set that to O.

Click back on COMMON SETTINGS.

clip_image114

Click on MANAGE HOSTGROUPS.

clip_image116

Click on the “*” and click ADD SELECTED.  It’s reasonable to assume we want a PING sensor on EVERY HostGroup, yes?  If you click on the 3 that were listed here only and added them, and then later added a 4th net-new hostgroup, it would not have this PING sensor.  For now, let’s just add it to our SERVER-HARDWARE hostgroup.  Click CLOSE.  NOW click SAVE J

clip_image118

Let’s click APPLY CONFIGURATION.

12) By now you’ll be familiar with: If you go back to the Nagios window (I keep a NAGIOS and a NAGIOS ADMIN tab open), and click HOME -> QUICK FIND, enter “NW-ESX” and click GO:

clip_image120

Look at that.  All the hosts in the hostgroup are now checking Ping as well J

clip_image122

And moments later show all okay.

Here you can see how the SERVICE DESCRIPTION “check_xi_host_ping” works.  IF we go back and change that just to “Ping”:

clip_image124

And then click SAVE, and APPLY, then come back to the HOSTS view:

clip_image126

Ta-da!

I’m going to go through all the same steps, without displaying them, and add a HTTPS sensor, as the IPMI cards are web manageable.  We want to know if the WebUI on them should happen to die.

clip_image128

And look at that.

So as you can see, HOSTGROUPS and SERVICE/SERVICEGROUPS are key to making NAGIOS really sing.  I have NOT touched on ALERTS, CONTACTS, ALERT PERIODS, etc.  For now, let’s worry about if we can get Nagios *monitoring* what we want.

Categories: C6100, Dell, nagios

Dell C6100 BIOS/BMC Configuration

July 3, 2013 14 comments

As I’ve recently purchased a Dell C6100 series 4 node Cloud Server for use in my lab. For those of you who might also be thinking of picking one up, or just for my own reference, I want to document the procedure for configuring the IPMI. These Dell units are not “PowerEdge” servers, so the BMC/IPMI is standard, and not a more typical iDRAC.

1) From the BIOS upon boot, configure the IPMI for an IP Address by pressing F2

clip_image001

clip_image002

2) Select ADVANCED

clip_image003

Select CPU CONFIGURATION:

clip_image004

Ensure that Intel VT-D, VT, and HT are enabled for your vSphere hosts. Press ESC.

Select USB CONFIGURATION:

clip_image005

If you like to install TO or FROM USB, ensure that the Controller Mode is HISPEED. Press ESC.

Select PCI CONFIGURATION:

clip_image006

For the NIC FUNCTION SUPPORT, when you press ENTER you get a list of options – PXE, ISCSI and DISABLED. Note that this is not intuitive – you’d think if you don’t want PXE or ISCSI, you’d want DISABLED. However, this disables the NICs entirely. So read this as “What Mode do you want the NIC to operate in?”. I have chosen PXE.

Then, on NIC1/NIC2 OPTION ROM:

clip_image007

You can choose to enable or disable the PXE or ISCSI boot process, to speed up your boots. Press ESC.

3) Select BOOT.

clip_image008

Select BOOT SETTINGS CONFIGURATION:

clip_image009

The important option here is FORCE USB FIRST. By doing so, you can not only force USB thumb drives to boot first, but they will be detected as HARD DISK vs REMOVABLE. This may be needed for some OS’s to install to USB thumb drives. Press ESC.

Select BOOT DEVICE PRIORITY:

clip_image010

Here you can also set the USB/Removable to be the first USB device. Press ESC.

Select HARD DISK DRIVES:

clip_image011

From this screen you can select the presentation order of the HDD’s. While I have an SSD present, I don’t want to install the OS to it, as I want to use it for caching and other purposes. So I have ensured that it is not going to be detected as the first hard drive. Press ESC.

4) Select SERVER:

clip_image012

The important options here for virtualiation will likely be:

POWER MANAGEMENT = MAXIMUM PERFORMANCE

RESTORE AC ON POWER LOSS = POWER ON

AC POWER RECOVERY DELAY = IMMEDIATE

Select IPMI CONFIGURATION:

clip_image013

Set the BMC NIC as DEDICATED if you wish to use the 3rd port and not share any of the 2 on board NICS.

Select SET LAN CONFIGURATION:

clip_image014

This screen should be fairly self-explanatory. If your BMC is to be STATIC, then entire the IP ADDRESS, SUBNET MASK and DEFAULT GATEWAY IP. Press ESC.

5) Select EXIT

clip_image015

Then SAVE CHANGES and EXIT.

6) Load a web browser and launch an HTTPS session to the iP in question.

clip_image016

The default username and password is “root”

7) Click on CONFIGURATION on the left hand menu, and then NETWORK on the top menu.

clip_image017

Configure the appropriate settings. Note that the options for using DHCP to configure DNS and DNS domain name, only work if the IP address is also configured with DHCP. You cannot use a static address and DHCP for the remainder.

Click SAVE

8) Click on CONFIGURATION on the left hand menu, and then SMTP on the top menu.

clip_image018

Enter the IP address of the mail server. You’re going to be tempted to use a DNS name, so you can use load balancers, round robin, etc.

clip_image019

That’s not going to work, but we could have guessed that.

clip_image020

After entering the IP and clicking SAVE, press OK.

9) Click on CONFIGURATION on the left hand menu, and then ALERTS on the top menu.

clip_image021

Select the first alert and click MODIFY.

clip_image022

Modify:

ALERT TYPE = EMAIL

EVENT SEVERITY = WARNING (or better)

EMAIL ADDRESS = <something appropriate>

SUBJECT = <something appropriate>

MESSAGE = <something appropriate>

Click SAVE

10) Click on CONFIGURATION on the left hand menu, and then USERS on the top menu.

clip_image023

Click MODIFY USER after selecting the “root” user.

clip_image024

Here you can set the USER NAME and PASSWORD. Also select the appropriate NETWORK PRIVELEDGES. For example, you may want USER if the user should have read only access to logs and alerts. Click MODIFY to save the settings.

11) If you need to do a BIOS update, click on MAINTENANCE on the left hand menu, and then click ENTER UPDATE MODE.

clip_image025

12) If you need to do a POWER CONTROL, click on REMOTE CONTROL on the left hand menu, and then SERVER POWER CONTROL on the top menu. clip_image026

13) If you need to do a CONSOLE REDIRECTION, click on REMOTE CONTROL on the left hand menu, and then CONSOLE REDIRECTION on the top menu.

clip_image027

From here you can launch a java console which will allow remote control.

14) When you launch the JAVA CONSOLE, you get a JViewer screen pop up which looks like:

clip_image028

This is of course how I captured steps 1 through 5 above, after the fact.

The important details are the KEYBOARD menu:

clip_image029

And the DEVICE menu:

clip_image030

This is where you can remotely map to an ISO file on your hard drive.

clip_image031

Simply select the file and click OPEN.

clip_image032

Now when you click the DEVICE menu, you will see a check box next to REDIRECT ISO.

15) Reboot the server – either via Power Control, or using the Remote Console (eg: exit the BIOS screen, I’m currently showing).

clip_image033

Ensure you keep an eye on the screen. You want to press F11 to launch the BBS – Boot Selection menu.

clip_image034

Select the USB: AMI VIRTUAL CDROM

clip_image035

From here, it’s just like you’re at the console.

Categories: C6100, Dell, Hardware, Home Lab