Archive

Archive for the ‘Dell’ Category

HOWTO: Dell vSphere 6.0 Integration Bits for Servers

October 5, 2015 3 comments

When I do a review of a vSphere site, I typically start by looking to see if best practices are being followed – then look to see if any of the 3rd party bits are installed. This post picks on Dell environments a little, but the same general overview holds true of HP, or IBM/Lenovo, or Cisco, or…. Everyone has their own 3rd party integration bits to be aware of. Perhaps this is the part where everyone likes Hyper Converged because you don’t have to know about this stuff. But as an expert administering it, you should at least be aware of it, if not an expert.

I’m not going to into details as to how to install or integrate these components. I just wanted to make a cheat sheet for myself, and maybe remind some folks that regardless of your vendor, make sure you check for the extra’s – its part of why you’re not buying white boxes, so take advantage of it. Most if it is free!

The links:

I’ve picked on a Dell PowerEdge R630 server, but realistically any 13G box would have the same requirements. Even older 11/12G boxes such as an R610 or R620 would. So first we start with the overview page for the R630 – remember to change that OS selection to “VMware ESXi v6.0”
http://www.dell.com/support/home/us/en/04/product-support/product/poweredge-r630/drivers

 

Dell iDRAC Service Module (VIB) for ESXi 6.0, v2.2
http://www.dell.com/support/home/us/en/04/Drivers/DriversDetails?driverId=2XHPY

You’re going to want to be able to talk to and manage the iDRAC from inside of ESXi, so get the VIB to allow you to do so. This installs via VUM incredibly easy.

 

Dell OpenManage Server Administrator vSphere Installation Bundle (VIB) for ESXi 6.0, v8.2
http://www.dell.com/support/home/us/en/04/Drivers/DriversDetails?driverId=VV2P2

Next, you’ll want to be able to handle talking to OMSA on the ESXi box itself, to get health, management, inventory, and other features. Again, this installs with VUM.

 

OpenManage™ Integration for VMware vCenter, v3.0
http://www.dell.com/support/home/us/en/04/Drivers/DriversDetails?driverId=8V0JG

This will let your vCenter present you with various tools to manage your Dell infrastructure right from within vCenter. Installs as an OVF and is a virtual appliance, so no server required.

 

VMware ESXi 6.0
http://www.dell.com/support/home/us/en/04/Drivers/DriversDetails?driverId=CG9FP 

Your customized ESXi installation ISO. Note the file name – VMware-VMvisor-Installer-6.0.0-2809209.x86_64-Dell_Customized-A02.iso – based on the -2809209 and the –A02 and a quick Google search, you can see that this is v6.0.0b (https://www.vmware.com/support/vsphere6/doc/vsphere-esxi-600b-release-notes.html) vs v6.0U1.

 

Dell Systems Management Tools and Documentation DVD ISO, v.8.2
http://www.dell.com/support/home/us/en/04/Drivers/DriversDetails?driverId=4HHMH

You likely will not need this for a smaller installation, but it can help out if you need to standardize, by allowing you to configure and export/import things like BIOS/UEFI, Firmware, iDRAC, LCC, etc settings. Can’t hurt to have around.

 

There is no longer a need for a “SUU” – Systems Update Utility, as the Lifecycle Controller built into ever iDRAC, even the Express will allow you to do updates from that device. I recommend doing them via the network as it is significantly less hassle than going through the Dell Repository Builder, and downloading your copies to a USB/ISO/DVD media and doing it that way.

Now, the above covers what you’ll require for vSphere. What is NOT immediately obvious is the tools you may want to use in Windows. Even though you now have management capability on the hosts and can see things in vCenter, you’re still missing the ability to talk to devices and manage them from Windows – which is where I spend all of my actual time. Things like monitoring, control, management, etc, all are done from within Windows. So let’s go ahead and change that OS to “Windows Server 2012 R2 SP1” and get some additional tools:

 

Dell Lifecycle Controller Integration 3.1 for Microsoft System Center Configuration Manager 2012, 2012 SP1 and 2012 R2
http://www.dell.com/support/home/us/en/04/Drivers/DriversDetails?driverId=CKHYR

If you are a SCCM shop, you may very much want to be able to control the LCC via SCCM to handle hardware updates.

 

Dell OpenManage Server Administrator Managed Node(windows – 64 bit) v.8.2
http://www.dell.com/support/home/us/en/04/Drivers/DriversDetails?driverId=6J8T3

Even though you’ve installed the OMSA VIB’s on ESXi, there is no actual web server there. So you’ll need to install the OMSA Web Server tools somewhere – could even be your workstation – and use that. You’ll then select “connect to remote node” and specify the target ESXi system and credentials.

 

Dell OpenManage Essentials 2.1.0
http://www.dell.com/support/home/us/en/04/Drivers/DriversDetails?driverId=JW22C

If you’re managing many Dell systems and not just servers, you may want to go with OME if you do not have SCCM or similar. It’s a pretty good 3rd party SNMP/WMI monitoring solution as well. But will also allow you to handle remote updates of firmware, BIOS, settings, etc, on various systems – network, storage, client, thin client, etc.

 

Dell OpenManage DRAC Tools, includes Racadm (64bit),v8.2
http://www.dell.com/support/home/us/en/04/Drivers/DriversDetails?driverId=9RMKR 

RACADM is a tool I’ve used before and have some links on how to use remotely. But this tool can grandly help you standardize your BIOS/IDRAC settings via a script.

 

Dell Repository Manager ,v2.1
http://www.dell.com/support/home/us/en/04/Drivers/DriversDetails?driverId=2RXX2

The repository manager as mentioned is a tool you can use to download only the updates required for your systems. Think of it like WSUS (ish).

 

Dell License Manager
http://www.dell.com/support/home/us/en/04/Drivers/DriversDetails?driverId=68RMC

The iDRAC is the same on every system it is only the licence that changes. To apply the Enterprise license, you’ll need the License Manager.

 

Hopefully this will help someone keep their Dell Environment up to date. Note that I have NOT called out any Dell Storage items such as MD3xxx, Equallogic PSxxxx, or Compellent SCxxxx products. If I tried to, the list would be significantly longer. Also worth noting is that some vendors _networking_ products have similar add ins, so don’t forget to look for those as well.

Modifying the Dell C6100 for 10GbE Mezz Cards

June 11, 2015 5 comments

In a previous post, Got 10GbE working in the lab – first good results, I talked about getting 10GbE working with my Dell C6100 series.  Recently, a commenter asked me if I had any pictures of the modifications I had to make to the rear panel to make these 10GBE cards work.  As I have another C6100 I recently acquired (yes, I have a problem…), that needs the mods, it seems only prudent to share the steps I took in case it helps someone else.

First a little discussion about what you need:

  • Dell C6100 without the rear panel plate to be removed
  • Dell X53DF/TCK99 2 Port 10GbE Intel 82599 SFP+ Adapter
  • Dell HH4P1 PCI-E Bridge Card

You may find the Mezz card under either part number – it seems that the X53DF replaced the TCK99.  Perhaps one is the P/N and one is the FRU or some such.  But you NEED that little PCI-E bridge card.  It is usually included, but pay special attention to the listing to ensure it does.  What you DON’T really need, is the mesh back plate on the card – you can get it bare. 

2015-06-11 21.18.132015-06-11 21.17.46

Shown above are the 2pt 10GbE SFP+ card in question, and also the 2pt 40GbE Infiniband card.  Above them both is the small PCI-E bridge card.

2015-06-11 21.19.24

You want to remove the two screws to remove the backing plate on the card.  You won’t be needing it, and you can set it aside.  The screws attach through the card and into the bracket, so once removed, reinsert the screws to the bracket to keep from losing them.

2015-06-11 21.17.14

Here we can see the back panel of the C6100 sled.  Ready to go for cutting.

2015-06-11 21.22.232015-06-11 21.24.48

You can place the factory rear plate over the back plate.  Here you can see where you need to line it up and mark the cuts you’ll be doing.  Note that of course the bracket will sit higher up on the unit, so you’ll have to adjust for your horizontal lines. 

2015-06-11 21.23.092015-06-11 21.22.49

If we look to the left, we can see the source of the problem that causes us to have to do this work.  The back panel here is not removable, and wraps around the left corner of the unit.  In systems with the removable plate, this simply unscrews and panel attached to the card slots in.  In the right hand side you can see the two screws that would attach the panel and card in that case.

2015-06-11 21.35.38

Here’s largely what we get once we complete the cuts.  Perhaps you’re better with a Dremel than I am. Note that the vertical cuts can be tough depending on the size of the cutting disk you have, as they may have interference from the bar to remove the sled. 

2015-06-11 21.36.162015-06-11 21.36.202015-06-11 21.36.28

You can now attach the PCI-E bridge card to the Mezz card, and slot it in.  I found it easiest to come at about 20 degree angle and slot in the 2 ports into the cut outs, then drop the PCI-E bridge into the slot.  When it’s all said and done, you’ll find it pretty secure and good to go.

That’s really about it.  Not a whole lot to it, and if you have it all in hand, you’d figure it out pretty quick.  This is largely to help show where my cut lines ended up compared tot he actual cuts and where adjustments could be made to make the cuts tighter if you wanted.  Also, if you’re planning to order, but are not sure if it works or is possible, then this is going to help out quite a bit.

Some potential vendors I’ve had luck with:

http://www.ebay.com/itm/DELL-X53DF-10GbE-DUAL-PORT-MEZZANINE-CARD-TCK99-POWEREDGE-C6100-C6105-C6220-/181751541002? – accepted $60 USD offer.

http://www.ebay.com/itm/DELL-X53DF-DUAL-PORT-10GE-MEZZANINE-TCK99-C6105-C6220-/181751288032?pt=LH_DefaultDomain_0&hash=item2a513890e0 – currently lists for $54 USD, I’m sure you could get them for $50 without too much negotiating.

Categories: C6100, Dell, Hardware, Home Lab

HOWTO: Migrate RAID types on an Equallogic array

October 17, 2014 Leave a comment

I’ve run into a situation where I have a need to change RAID types on an Equallogic PS4100 in order to provide some much needed free space.  Equallogic supports on the fly migration as long as you go in a supported migration path:

clip_image002

  • RAID 10 can be changed to RAID50 or RAID6
  • RAID 50 can be changed to RAID6
  • RAID 6 cannot be converted.

By changing from RAID50 to RAID6 on a 12x600GB SAS unit, we can go from 4.1TB to 4.7TB, which will help get some free space and provide some extra life to this environment. 

1) Login to the array, and click on the MEMBER, then MODIFY RAID CONFIGURATION:

clip_image004

Note that the current RAID configuration is shown as “RAID 50” and STATUS=OK.

2) Select the new RAID Policy of RAID6:

clip_image006

Note the change in space – from 4.18TB to 4.69TB, and a net change of 524.38GB, or about 12% extra space.  Click OK.

3) During the conversion, the new space is not available – which should be expected:

clip_image008

After the conversion, the space will be available.  Until then, the array status will show as “expanding”, as indicated.  Click OK.

4) You can watch the status and see that the RAID Status does indeed show “expanding” and a PROGRESS of 0%:

clip_image010

After about 7 hours, we’re at 32% complete.  Obviously this is going to depend on the amount of data, size of disks, load on the array, etc.   But we can sarely assume this will take at least 24 hours to complete. 

5) When the process completes, you will see that the RAID Status is OK as well as the MEMBER SPACE area will show free space:

clip_image012

Understandably, you now need to use this space.  It won’t be automatically applied to your existing Volumes/LUN’s, so you’re left with two obvious choices – grow an existing volume or create a net new one.  As it is expected creating net-new is understood, I’ll demonstrate how to grow an existing volume.

6) On the bottom left of the interface, select VOLUMES:

clip_image014

Then in the upper left, expand the volumes:

clip_image016

Select the volume you wish to grow.  I’ll choose EQVMFS1.

clip_image018

Click MODIFY SETTINGS and then the SPACE tab.  Change the volume size accordingly.  It does indicate what the Max (1.34TB) can be.  I would highly recommend you reserve at least some small portion of space – just in case you ever completely fill a volume you may need to grow it slightly to even be able to mount it.  Even if small, always leave an escape route. 

Click OK.

clip_image020

You are warned to create a snapshot first.  As these volumes are empty, we won’t be needing to do this.  Click NO.

clip_image022

Note the volume size now reports as 1.3TB.

7) Next, we go to vSphere to grow the volume. 

Right click on the CLUSTER and choose RESCAN FOR DATASTORES:

clip_image024

Next, once that completes (watch the Recent Tasks panel), select a host with the volume mounted and go to the CONFIGURATION -> STORAGE tab.  Right click on the volume and choose PROPERTIES.

clip_image026

8) Click INCREASE on the next window:

clip_image028

Then select the LUN in question:

clip_image030

NOTE that in this example, I’m upgrading a VMFS3 volume.  It will ultimately be blown away and recreated as VMFS.  But if you are doing this, you will see warnings if you try to grow about 2TB, as it indicates.  Click NEXT.

clip_image032

Here we can see the existing 840GB VMFS as well as the new Free Space of 491GB.  Click NEXT.

clip_image034

Choose the block size, if it allows you.  Again, this is something you won’t see on a VMFS5 datastore.  Click NEXT and then FINISH.

9) As this is a clustered volume, once complete, it will automatically trigger a rescan on all the remaining cluster hosts to pick up the change:

clip_image036

You don’t have to do anything for this to happen. 

And that’s really about it.  You have now expanded the RAID group on the Equallogic, and added the space to an existing volume.  Some caveats of course to mention at this point:

  • Changing RAID types will likely alter your data protection and performance expectations.  Be sure you have planned for this.
  • As noted before, once you go RAID6 you can’t go anywhere from there without a offload and complete rebuild of the array.
  • If you hit the wall, and got back ~ 10%, this is your breathing room.  You should be evaluating space reclamation tactics, new arrays, etc.  This only gets you out of today’s jam.
Categories: Dell, Equallogic, ISCSI, Storage, vSphere

Design Exercise–Scaling Up–Real World Example

October 13, 2014 Leave a comment

My previous post on Design Exercise- Scaling up vs Scaling out appeared to be quite popular. A friend of mine recently told me of an environment, and while I have only rough details of it, it gives me enough to make a practical example of a real world environment – which I figured might be fun. He indicated that while we’d talked about the ideas in my post for years, it wasn’t until this particular environment that it really hit home.

Here are the highlights of the current environment:

  • Various versions of vSphere – v3.5, v4.x, v5.x, multiple vCenters
  • 66 hosts – let’s assume dual six core Intel 55xx/56xx (Nahelem/Westermere) CPU’s
  • A quick tally suggests 48GB of RAM per host.
  • These hosts are blades, likely HP. 16 Blades per chassis, so at least 4 chassis. For the sake of argument, let’s SAY it’s 64 hosts, just to keep it nice and easy.
  • Unknown networking, but probably 2x 10GbE, and 2x 4Gbit/FC, with passthru modules

It might be something very much like this. In which case, it might be dual 6 core CPU’s, and likely only using 1GbE on the front side. This is probably a reasonable enough assumption for this example, especially since I’m not trying to be exact and keep it theoretical.

http://www.ebay.ca/itm/HP-c7000-Blade-Chassis-16x-BL460c-G6-2x-6-C-2-66GHz-48GB-2x-146GB-2x-Gbe2c-2x-FC-/221303055238?pt=COMP_EN_Servers&hash=item3386b0a386

I’ve used the HP Power Advisor (http://www8.hp.com/ca/en/products/servers/solutions.html?compURI=1439951#.VDnvBfldV8E) to determine the power load for a similarly configured system with the following facts:

  • 5300 VA
  • 18,000 BTU
  • 26 Amps
  • 5200 Watts total
  • 2800 Watts idle
  • 6200 Watts circuit sizing
  • 6x 208V/20A C19 power outlets
    clip_image001

We’ll get to that part later on. For now, let’s just talk about the hosts and the sizing.

Next, we need to come up with some assumptions.

  • The hosts are likely running at 90% memory and 30% CPU, based on examples I’ve seen. Somewhere in the realm of 2764GB of RAM and 230 Cores.
  • The hosts are running 2 sockets of vSphere Enterprise Plus, with SnS – so we have 128 sockets of licences. There will be no theoretical savings on net-new licences as they’re already owned – but we might save money on SnS. There is no under-licencing that we’re trying to top up.
  • vSphere Enterprise Plus we’ll assume to be ~ $3500 CAD/socket and 20% for SnS or about $700/year/socket.
  • The hosts are probably not licenced for Windows Data Center, given the density – but who knows. Again, we’re assuming the licences are owned, so no net-new savings but there might be on Software Assurance.
  • We’re using at least 40U of space, or a full rack for the 4 chassis
  • We’re using 20,800 Watts or 21 kWhr
  • While the original chassis are likely FC, let’s assume for the moment that it’s 10gbE ISCSI or NFS.

Now, let’s talk about how we can replace this all – and where the money will come from.

I just configured some Dell R630 1U Rack servers. I’ve used two different memory densities to deal with some cost assumptions. The general and common settings are:

  • Dell R630 1U Rack server
  • 2x 750 Watt Power Supply
  • 1x 250GB SATA just have “a disk”
  • 10 disk 2.5” chassis – we won’t be using local disks though.
  • 1x PERC H730 – we don’t need it, but we’ll have it in case we add disks later.
  • Dual SD module
  • 4x Emulex 10GbE CNA on board
  • 2x E5-2695 v3 2.3GHz 14C/28T CPU’s

With memory we get the following numbers:

  • 24x 32GB for 768GB total – $39.5 Web Price, assume a 35% discount = $26K
  • 24x 16GB for 368GB total – $23.5 Web Price, assume a 35% discount = $15.5K

The first thing we want to figure out is if the memory density is cost effective. We know that 2x of the 384GB configs would come to $31K or $6K more than the 2x servers. So even without bothering to factor for licencing costs, we know it’s cheaper. If you had to double up on vSphere, Windows Data Center, Veeam, vCOPS, etc, etc, then it gets worse. So very quickly we can make the justification to only include the 768GB configurations. So that’s out of the way. However, it also tells us that if we need more density, we do have some wiggle room to spend more on better CPU’s with more cores/speeds – we can realistically spend up to $3K/CPU more and still work out to be the same as doubling the hosts with half the RAM.

Now how many will we need? We know from above “Somewhere in the realm of 2764GB of RAM and 230 Cores”. 230 cores / 28 cores per server means we need at least 8.2 hosts – we’ll assume 9.; 2764GB of RAM, only requires 3.6 hosts. But we also need to assume we’ll need room for growth. Based on these numbers, let’s work with the understanding we’ll want at least 10 hosts to give us some overhead on the CPU’s, and room for growth. If we’re wrong, we have lots of spare room for labs, DEV/TEST, finally building redundancy, expanding poorly performing VM’s, etc. No harm In that. This makes the math fairly easy as well:

  • $260K – 10x Dell R630’s with 768GB
  • $0 – licence savings from buying net new

We’ve now cost the company, $260K, and so far, haven’t shown any savings or justification. Even just based on hardware refresh and lifecycle costs, this is probably a doable number. This is $7.2K/month over 36 months.

What if we could get some of that money back? Let’s find some change in the cushions.

  • Licence SnS savings. We know we only need 20 sockets now to licence 10 hosts, so we can potentially let the other 108 sockets lapse. At $700/socket/year this results in a savings of $75,600 per year, or $227K over 36 months. This is 87% of our purchase cost for the new equipment. We only need to find $33K now
  • Power savings.
    clip_image002
    The Dell Energy Smart Solution Advisor (http://essa.us.dell.com/dellstaronline/Launch.aspx/ESSA?c=us&l=en&s=corp) suggests that each server will require 456Watts, 2.1 Amps and 1600 BTU of cooling. So our two solutions look like
    clip_image003
    I pay $0.085/kWhr here so I’ll use that number. In the co-location facilities I’m familiar with, you’re charged per power whip not usage. But as this environment is on site, we can assume they’re being charged only as used.
    We’ve now saved another $1K/month or $36K over 36 months. We have saved $263K on a $260K purchase. How am I doing so far?
  • I

  • Rack space – we’re down from 40U to 10U of space. Probably no cost savings here, but we can reuse the space
  • Operational Maintenance – we are now doing Firmware, Patching, Upgrades, Host Configuration, etc, across 10 systems vs 64. Regardless of if that time accounts for 1 or 12 hours per year per server, we are now doing ~ 84% less work. Perhaps now we’ll find the time to actually DO that maintenance.

So based on nothing more than power and licence *maintenance*, we’ve managed to recover all the costs. We also have drastically consolidated our environment, we can likely “finally” get around to migrating all the VM’s into a single vSphere v5.5+ environment and getting rid of the v3.5/v4.x/etc mixed configuration that likely was left that way due to “lack of time and effort”.

We also need to consider the “other” ancillary things we’re likely forgetting as benefits. Everyone one of these things that a site of this size might have, represents a potential savings – either in net-new or maintenance:

  • vCloud Suite vs vSphere
  • vCOPS
  • Veeam or some other backup product, per socket/host
  • Window Server Data Center
  • SQL Server Enterprise
  • PernixData host based cache acceleration
  • PCIe/2.5” SSD’s for said caching

Maybe the site already has all of these things. Maybe they’re looking at it for next year’s budget. If they have it, they can’t reduce their licences, but could drop their SnS/Maintenance. If they’re planning for it, they now need 84% less licencing. My friends in sales for these vendors won’t like me very much for this, I’m sure, but they’d also be happy to have the solution be sellable and implemented and a success story – which is always easier when you don’t need as many.

I always like to provide more for less. The costs are already a wash, what else could we provide? Perhaps this site doesn’t have a DR site. Here’s an option to make that plausible:

  • $260K – 10x R630’s for the DR site
  • $0K – 20 sockets of vSphere Enterprise – we’ll just reuse some of the surplus licencing. We will need to keep paying SnS though.
  • $15K – 20 sockets of vSphere Enterprise SnS
  • $40K – Pair of Nexus 5548 switches? Been a while since I looked at pricing
    Spend $300K and you have most of a DR environment – at least the big part. You still have no storage, power, racks, etc. But you’re far closer. This is a much better use of the same original dollars. The reason for this part of the example is because of the existing licences and we’re not doing net-new. The question of course from the bean-counters will be “so what are we going to do, just throw them away???”

Oh. Right. I totally forgot. Resale J

http://www.ebay.ca/itm/HP-C7000-Blade-Enclosure-16xBL460C-G6-Blades-2xSix-Core-2-66GHZ-X5650-64GB-600GB-/271584371114?pt=COMP_EN_Servers&hash=item3f3bb0a1aa

There aren’t many C7000/BL460C listed as “Sold” on eBay, but the above one sold for ~ $20K Canadian. Let’s assume you chose to sell the equipment to a VAR that specializes in refurbishing – they’re likely to provide you with 50% of that value. That’s another $10K/chassis or $40K for the 4 chassis’.

As I do my re-read of the above, I realize something. We need 9 hosts to meet CPU requirements, but we’d end up with 7680GB of RAM where we only really require 2764GB today. This brings the cost down to ~ $31K Web Price or $20K with 35% discount. At a savings of $6K/server, we’d end up with 5120GB of RAM – just about double what we use today, so lots of room for scale up. We’ll save another $60K today. In the event that we ever require that capacity, we can easily purchase the 8*32GB/host at a later date – and likely at a discount as prices drop over time. However – often the original discount is not applied to parts and accessory pricing for a smaller deal, so consider if it actually is a savings. How would you like a free SAN? J Or 10 weeks of training @ $6K each? I assume you have people on your team who could benefit from some training? Better tools? Spend your money BETTER! Better yet, spend the money you’re entrusted to be the steward of, better – it’s not your money, treat it with respect.

A re-summary of the numbers:

  • +$200K – 10x R630’s with 512GB today
  • +$0K – net-new licencing for vSphere Enterprise Plus
  • -$227K – 108 sockets of vSphere SnS we can drop, over 3 years.
  • -$36K – Power savings over 3 years
  • -$40K – Resale of the original equipment

Total: $103K to the good.

 

Footnote: I came back thinking about power.  The Co-Location facility I’ve dealt with charges roughly:

  • $2000/month for a pair of 208V/30A circuits
  • $400/month for a pair of 110V/15A circuits
  • $Unknown for a pair of 20A circuits, unfortunately.

I got to thinking about what this environment would need – but also what it has.  In my past, I’ve seen a single IBM Blade Center chassis using 4x 208V/30A circuits, even if it could have been divided up better.  So let’s assume the same inefficiency was done here.  Each HP C-Series chassis at 25.4A would require 3x Pairs, or 12x Pairs for the total configuration – somewhere in the area of $24,000/month in power.  Yikes!  Should it be less?  Absolutely.  But it likely isn’t, based on the horrible things I’ve seen – probably people building as though they’re charged by usage and not by drop.

The 10x Rack servers if I switch them to 110V vs 208V indicate they need the 3.5A each – which is across both circuits..  This I think is at max, but let’s be fair and say you wouldn’t put more than 3x (10.5A) on a 15A circuit.  So you need 4x $400 pairs for $1600/month in power.  Alternatively, you could put them all on a 208V/30A pair for 21A total, for $2000/month.  If you could, this would be the better option as it lets you use only one pair of PDU’s, and you have surplus for putting in extra growth, Top of Rack switching, etc. 

So potentially, you’re also going to go from $24K to $2K/month in power.  For the sake of argument, let’s assume I’m way wrong on the blades, and it’s using half the power or $12K.  You’re still saving $10K/month – or $360K over 36 months.  Did you want a free SAN for your DR site maybe?  Don’t forget to not include the numbers previously based on usage vs drop power, or your double dipping on your savings. 

(New) Total: $427K to the good – AFTER getting your new equipment. 

Hi.  I just saved you half a million bucks

Categories: Dell, Design, Hardware, VMware

Design Exercise-Fixing Old or Mismatched Clusters

October 2, 2014 Leave a comment

In two previous posts, I talked about some design examples I’ve seen:

Design Exercise – Scaling up vs Scaling out

Design Exercise – DR Reuse

Today I’m going to talk about the “No problem, we’ll just add a host” problem.  But not in the “one more of the same” scenario, instead a “we can’t get those any longer, so we’ll add something COMPLETELY different” scenario.

Regardless of if the current site is something like previously described with matching systems (eg: 4x Dell PE2950’s) or random systems, often when capacity runs out, budget is likely low, and so the discussion comes up to “Just add a host”. But as we know from previous examples, adding additional hosts costs money for not only hardware, but licences. I have two different example sites to talk about:

Example 1:

  • 4x Dell PE2950, 2x 4 Core, 32GB RAM, 4x 1GBE hosts

Example 2:

  • 1x Dell T300, 1x 4 Core, 32GB RAM, 2x 1GbE
  • 1x HP DL380 G6, 2x 4 Core, 64GB RAM, 4x 1GbE
  • 1x Dell R610, 2x 6 Core, 96GB RAM, 4x 1GbE

In both cases, we’ll assume that the licencing won’t change as we’re not going to discuss actually adding any hosts, so all software/port counts remain the same.

As you can see, neither environment is particularly good. They both old, but Example 2 is horribly mismatched. DRS is going to have a hell of a time finding proper VM slots to use, the capacity is mis-matched, and nothing is uniform. The options to fix this all involve investing good money after bad. But often an environment that is this old or mismatched, likely ended up this way due to lack of funds. We can talk about proper planning and budgeting until we’re blue in the face, but what we need to do right now is fix the problem. So let’s assume that even if we could add or replace one of the hosts with something more current, like your $7000 R620 with 2x 6 Core and 128GB, this is not in budget. Certainly, 3-4 of them is not, and certainly not for the bigger/better systems at $10K+.

So what if we go used? Ah, I can hear it now, the collective rants of a thousand internet voices. “But we can’t go used, it’s old and it might fail, and it’s past it’s prime”. Perhaps – but look at what the environments currently are. Plus, if someone had something ‘newer’, that they’d owed for 2 years into a 3-5 year warranty, it would be “used” as well, no? Also, accepting for complete and spontaneous host failures, virtualization and redundancy gives us a lot of ways to mitigate actual hardware failures. Failing network ports, power supplies, fans, etc, will all trigger a Host Health alert. This can be used to automatically place the host in Maintenance Mode and have DRS evacuate it and send you an e-mail. So yes, a part may fail, but we build _expecting_ that to be true.

Now assume that the $7000 option for a new host *IS* in budget. What could we do instead? We certainly don’t want to add a single $7000 host to the equation, for all the reasons noted. Now we look into what we can do with off-lease equipment. This is where being a home-labber has its strengths – we already know what hardware is reliable and plentiful, and still new enough to be good and not quite old enough to be a risk.

What if I told you that for about $1500 CAD landed, you could get the following configuration:

Example 1 can now, for around $6000 CAD, replace all 4 hosts with something newer, that will have 16 more cores, and 4x the RAM. It’s not going to be anywhere near the solution from the other day with the 384GB hosts – but it’s also not going to be $40K in servers. Oh, plus 8U to 4U, power savings, etc.

Example 2 is able to replace those first 2 hosts and standardize, for around $3000.

In either case, they’re still “older” servers. A Dell R610 is circa 2009-2012, so you’re still looking at a 2-5 year old server at this point – which might be a little long in the tooth. But if the power is enough for you, and you’re just trying to add some capacity and get out of “scary old” zone, it might not be so bad. Heck, either of these sites are likely going to be very happy with the upgrades. Questions will need to be answered such as:

  • Lifespan – how long are we expecting these servers to be a solution for? Till the end of next calendar year or about 16-18 months? That’s fine.
  • Budget – are we doing this because we have run out of budget for this year but *NEED* “something”? Has next year’s budget been locked away and this was ‘missed’, but you still need ‘something’?  If we assume these are 18 month solutions, to get us from now (Oct 2014) to “after next budget year” (Jan 2016), then Example 1 is $333/month and Example 2 is $167/month. Money may be tight, but that’s a pretty affordable way of pushing off the reaper.  Heck, I know people with bigger cell phone bills.
  • Warranty – these may or may not come with OEM warranty. Are you okay with that? Maybe what makes the most sense is just pick up an extra unit for “self-warranty” – it is almost certainly still cheaper than extending the OEM warranty. Remember though, OEM support also helps troubleshoot weird issues and software incompatibilities, etc. Self-warrantying just gets you hard parts, that you can swap – if you have time and energy to do so. Check if the secondary market reseller will over next day parts, that may be sufficient for you. Also, check if the vendor of the hardware you’re choosing will allow you to download software updates (eg: management software, firmware, BIOS, etc) without a service contract. Dell, at this point, still does, which is why I like them (for customers and my lab).  Oh, an advantage of the extra unit for “self-warranty”?  You can use it for Dev/Test, learning, testing things you want to try, validating hardware configurations, swapping parts for testing suspected issues, etc.
  • Other Priorities – do you need to spend the same money you’d spend on new hosts, elsewhere? Maybe you need a faster SAN today, because you’re out of capacity as well, and you have to make a choice. You can fix it next year, but you can’t fix both at once, regardless of effort or good intentions. Maybe you want to go to 10GbE switches today in preparation. Perhaps you want to spend the same money on training, so that your staff can “do more with less” and have “smarter people” instead of “more thingies, with no one to run them”.

I fully realize that off-lease, eBay, secondary market is going to throw up automatic “no’s” for a lot of people. Also, many management teams will simply say no. Some will have an aversion to “buying from eBay” – fine, call the vendor from their eBay auction, and get a custom quote with a PO directly, and but it just like you would from any other VAR. The point of the matter is, you have options, even if you’re cash strapped.

BTW, if anyone was thinking “why not just get R620’s” which are newer, you certainly could – http://www.ebay.ca/itm/DELL-POWEREDGE-R620-2-x-SIX-CORE-E5-2620-2-0GHz-128GB-NO-HDD-RAILS-/111402343301?pt=COMP_EN_Servers&hash=item19f018db85. One can get an R620, 2x 6 Core E5-2620, 128GB RAM (16x8GB almost certainly, but 24 DIMM slots), 4x 1GbE, iDRAC, etc, for about $3000. This would give you more room to grow and is newer equipment, but it starts getting much closer to the $7000 configuration direct from Dell with 3 year warranty, 10GbE ports, etc. Still, 4x $3K is much less than 4x $7K, and $16,000 is a lot of money you could spend on something else. Just watch you’re not paying too close to retail for it to be not be worth it.

The trick, coming from a home-lab guy, is to be “just old enough to not be worth any money to someone else” but “just new enough to still be really useful, if you know what you’re doing.”

Also, consider these options for the future.  Remember that ROI involves a sale.  Let’s say you purchased the brand new $7000 servers and made it 5 year warranty vs 3 year for… 20% more or about $8500.  You’re almost certainly not going to use it for 5 years.  But in 2.5 years, when you want to put that server on the secondary market, and it still has 2+ years of OEM warranty left – you’re going to find it has significantly more resale value.

This is no different than leasing the ‘right’ car with the ‘right options’, because you know it’ll have a higher resale value at the end of the lease.  If you’re the kind of person that would never “buy new, off the lot” and would always buy a “1-2 year old lease-return, so someone else can pay the depreciation” – this solution is for you.

If in one scenario you haul the unit away to recycling (please, call me, I offer this service for free Smile), and another you sell the equipment to a VAR for $2000/unit you can use as credit on your next purchase or services…

Categories: Dell, Design, Hardware, VMware, vSphere

Design Exercise–DR or Dev/Test Re-use

October 1, 2014 1 comment

In a previous post, I recently discussed some of the benefits of Scaling Up vs Scaling Out (https://vnetwise.wordpress.com/2014/09/28/design-exercise-scaling-up-vs-scaling-out/) and how you can save money by going big. In that example, the site already had 4 existing hosts, wanted 5 new ones, but settled on 3. We can all guess of course what the next thing to get discussed was, I’m sure…

“So let’s reuse the old 4 hosts, because we have them, and use them… for DR or a DEV/TEST environment”. This should be no surprise, that “because we have them” is a pretty powerful sell. Let’s talk about how that might actually cost you considerably more money than you should be willing to spend.

Just as quick reminder, a summary of the hardware and configurations in question:

OLD HOSTS: Dell PowerEdge 2950 2U, 2x E5440 2.8GHz 4 Core CPU, 32GB DDR2

NEW HOSTS: Dell PowerEdge R620 1U, 2x E5 2630L 2.4GHz 6 Core CPU, 384GB DDR3

1) Licencing

Our example assumes that the site needed new licencing for the new hardware – either it didn’t have any, it expired, it was the wrong versions, who knows. So if you reutilize those 4 hosts, you’re going to need 4-8 licences for everything. Assuming the same licence types and versions (eg: vSphere Enterprise Plus, Windows Server Data Center, Veeam Enterprise, etc. ) that works out to be:

  • 4x $0 hosts as above = $0
  • 8x Sockets of vSphere Enterprise Plus @ ~ $3500/each with SnS = $28,000
  • 4x Windows Server Data Center licences @ ~ $5000/each = $20,000
  • 8x Sockets of Veeam B&R licences @ ~ $1000/each = $8,000

Total Cost = $56,000

Total Resources = 89GHz CPU, 128GB RAM

That’s a lot of licencing costs, for such little capacity to actually run things.

2) Capacity

That’s only 128GB of RAM to run everything, and 96GB when taking into account N+1 maintenance. Even if it IS DEV/TEST or DR, you’ll still need to do maintenance. These particular servers COULD go to 64GB each, using 8GB DIMM’s, but they’re expensive and not really practical to consider.

3) Connectivity

Let’s assume part of why you were doing this, is to get rid of 1GbE in your racks. Maybe they’re old. Maybe they’re flaky. Maybe you just don’t want to support them. In either event, let’s assume you “need” 10GbE on them, if for no other reason but so that your Dev/Test *actually* looks and behaves like Production. No one wants to figure out how to do things in Dev with 12x1GbE and then try to reproduce it in Prod with 4x10GbE and assume it’s all the same. So you’ll need:

  • 8x 2pt 10GbE PCIe NICs @ $500 each = $4000
  • 8x TwinAx SFP+ cables @ $50 each = $400

We’ve now paid $4400 to upgrade our hosts to be able to use the same 10GbE infrastructure we were using for Prod. For servers that are worth maybe $250 on Kijiji or eBay (http://www.ebay.ca/itm/Dell-Poweredge-III-2950-Server-Dual-Quad-Core-2-83GHz-RAID-8-Cores-64Bit-VT-SAS-/130938124237?pt=COMP_EN_Servers&hash=item1e7c8537cd). Not the best investment.

4) Real Estate / Infrastructure

Re-using these existing hosts means 8U of space, probably 2x the power required, likely internal RAID and disks that is just burning up power and cooling.

A quick summary shows that we’ve now spent somewhere in the area of $60,000 to “save money” by reusing our old hardware. This will take up 8U of rack space, probably consume 1600W of power, and we’re investing hardware in very old equipment.

But what if we did similar to what we did with the primary cluster for Prod, and just bought… 2 more of the bigger new hosts.

2x Dev/Test Hosts @ 384GB:

  • 2x $11,500 hosts as above = $23,000
  • 4x Sockets of vSphere Enterprise Plus @ ~ $3500/each with SnS = $14,000
  • 2x Windows Server Data Center licences @ ~ $5000/each = $10,000
  • 4x Sockets of Veeam B&R licences @ ~ $1000/each = $4,000
  • 8x SFP+ TwinAx cables @ ~ $50/each = $400

Total Cost = $51,400

Total Resources = 57.6GHz CPU, 768GB RAM

Compared to:

Total Resources = 89GHz CPU, 128GB RAM

So we’ve now spent only $51,400 vs $60,000, and ended up with 6x the capacity on brand new, in warranty, modern hardware. The hardware is 100% identical to Prod. If we need or want to do any sort of testing in advance – vSphere patches, Firmware Upgrades, hardware configuration changes, we can now do so in Dev/Test and 100% validate that it will behave EXACTLY that way in Prod as it IS in fact exact. All of your training and product knowledge will also be the same, as you don’t have to consider variances in generations of hardware. We’re also going to use 2U and probably 600W of power vs 8U and 1600W.

If this is all in one site, and being used as Dev/Test you have a couple of ways you could set this up. We’re assuming this is all on the same SAN/storage, so we’re not creating 100% segregated environments. Also, the 10GbE switching will also be shared. So do you make a 3 node Prod and a 2 node Dev/Test cluster? Or do you make a 5 node cluster with a Prod and Dev/Test Resource Pool and use NIOC/SIOC to handle performance issues?

If this is for a second site, to potentially be used as DR, we’ve now saved $30K on the original solution and $8K on the solution we’re discussing now. This is $38K that you could spend on supporting infrastructure for your DR site – eg: 10GbE switching and SAN’s, which we haven’t accounted for at all. Granted $38K doesn’t buy a lot of that equipment – but it SURE is better than starting at $0. You just got handed a $40K coupon.

So, when you feel the urge to ask “but what should we do with this old hardware, can’t we do anything with it?” – the answer is “Yes, we can throw it away”. You’ll save money all day long. Give it to the keeners in your environment who want a home lab and let them learn and explore. If you really have no one interested… drop me a line. I ALWAYS have room in my lab or know someone looking. I’ll put it to use somewhere in the community .

Categories: Dell, Design, Hardware, VMware, vSphere

HOWTO: Fix Dell Lifecycle Controller Update issues

September 29, 2014 2 comments

Let’s say you’re in the middle of upgrading some Dell 11g hosts. They all have iDRAC 6 and Lifecycle Controllers, and you’ve downloaded the latest SUU DVD for this quarter. Then you want to update everything. You reboot the host, you press F10 to enter the LCC, you tell it to use the Virtual Media mounted SUU DVD ISO that it recognizes, it finds your updates, and you say go… only to get this:

clip_image001

Uh. So who authorizes them, because this is from a Dell SUU DVD, that’s about as good as I can get.

Turns out, I’m not the first person to have this problem, though it’s an older issue:

http://www.sysarchitects.com/solved-updates-you-are-trying-apply-are-not-dell-authorized-updates

http://en.community.dell.com/support-forums/servers/f/177/t/19475476

http://frednotes.wordpress.com/2012/11/21/the-updates-you-are-trying-to-apply-are-not-dell-authorized-updates/

It looks like the issue is that the LCC is at 1.4.0.586 currently – and needs to be 1.5.2 or better. 1.6.5.12 is current as of my SUU DVD, as you can see above. The other problem, is that Dell provides updates “in OS” for Linux and Windows – which doesn’t really help ESXi hosts at all. Seems the solution for this is a “OMSA Live CD” which I’ve never heard of until today. This can be found at: http://linux.dell.com/files/openmanage-contributions/om74-firmware-live/ and really good instructions on its use at: http://en.community.dell.com/techcenter/b/techcenter/archive/2014/03/20/centos-based-firmware-images-with-om-7-4-with-pxe

Now, the other alternative should have been to mount the SUU ISO as Virtual Media ISO and boot from it. But for whatever reason, this isn’t working and after selecting it, it just boots the HDD. I’m assuming, this is because the firmware on the iDRAC/LCC is too old and having some issues booting the ISO. That’s fine. I didn’t troubleshoot it too much after it failed 3 times in a row. I dislike hardware reboots that take 10 minutes, which is why I like VM’s, so I went looking for an alternative solution, and was happy with it.

When the system boots from the OM74 Live CD, it will auto-launch the update GUI:

clip_image002

Right now, you only need to do the Dell Lifecycle Controller. You could of course do more, but the point for me is to get the LCC working, then move back to doing the updates via that interface. So we’ll ONLY do the one update from here.

Click UPDATE FIRMWARE, and then:

clip_image003

Click UPDATE NOW in the upper left corner. You can see the STATUS DESCRIPTION showing it is being updated.

When the update is complete, you can then reboot the system and retry using the Unified System Configurator/Lifecycle Controller to complete the rest of your updates. (HOWTO- Using Dell iDRAC 7 Lifecycle Controller 2 to update Dell PowerEdge R420, R620, and R720 s would be a good place to look)

Design Exercise- Scaling up vs Scaling out

September 28, 2014 3 comments

Most people who know me, know that I have a thing for optimization and efficiency in the data center. One way I like to do so is by scaling up vs scaling out – and wanted to show an example of how this could work for you, if this is an option.

Recently, I had a client ask to replace some of their older servers and refresh their cluster. As they were still running (very tightly) on 4x Dell PowerEdge 2950’s with 32GB of RAM, their needs were clearly not super extensive. But they needed new hardware. For the sake of argument, let’s assume they also required licences – suppose their support agreements expired some time ago (by choice or omission, it doesn’t really matter). So we need new licences. The client knows they need newer/better hosts, and also “wants room for growth”. All well and good.

The request was to purchase 5x Dell PowerEdge R620 servers. Dual 6 core CPU (CPU usage sits about 30-40% for these users) and 128GB RAM (8x16GB) to keep the costs down. All systems would be diskless, booting via SD/USB, so no RAID controllers or local disks. Quad 10GbE NIC’s would be required, 2x for Data networks and 2x for Storage. Pretty basic stuff.

First, the general costs. Dell.ca still lets me build 12G servers, so let’s build one out with the above specs:

  • $7200 – 2x Intel E5-2630L 60w 6 Core 2.4GHz CPU, 8x16GB DDR3, 2xSD, 2x1GbE+2x10GbE Broadcom on board, 2x 10GbE Broadcom add-in card, redundant PSU

You may certainly choose different configurations, I simply chose one that gave me a good baseline here for an example.  Only the memory and potentially the CPU’s are changing throughout.

If we were to go ahead, as requested, we would need:

  • 5x $7200 hosts as above = $36,000
  • 10x Sockets of vSphere Enterprise Plus @ ~ $3500/each with SnS = $35,000
  • 5x Windows Server Data Center licences @ ~ $5000/each = $25,000
  • 10x Sockets of Veeam B&R licences @ ~ $1000/each = $10,000
  • 20x SFP+ TwinAx cables @ ~ $50/each = $1000

Total Cost = $107,000
Total Resources = 144GHz CPU, 640GB RAM

But what if you could do it with less hosts? They’d need to be more beefy for sure. But as this site only runs with 30-40% CPU load, we can increase the RAM and leave the CPU’s the same, and obtain better density. If we re-price the configuration with 16x16GB for 256GB total, we get a price of $9300. 24x16GB for 384GB total, we get a price of $11,500. The first reaction to this is usually something like “The hosts are 50% more, we can’t afford that”. Which usually fails to acknowledge that you no longer need as many. Let’s do the same math above, but with both new option:

256GB:

  • 4x $9300 hosts as above = $37,200
  • 8x Sockets of vSphere Enterprise Plus @ ~ $3500/each with SnS = $28,000
  • 4x Windows Server Data Center licences @ ~ $5000/each = $20,000
  • 8x Sockets of Veeam B&R licences @ ~ $1000/each = $8,000
  • 16x SFP+ TwinAx cables @ ~ $50/each = $800

Total Cost = $94,000
Total Resources = 115GHz CPU, 1024GB RAM

384GB:

  • 3x $11,500 hosts as above = $34,500
  • 6x Sockets of vSphere Enterprise Plus @ ~ $3500/each with SnS = $21,000
  • 3x Windows Server Data Center licences @ ~ $5000/each = $15,000
  • 6x Sockets of Veeam B&R licences @ ~ $1000/each = $6,000
  • 12x SFP+ TwinAx cables @ ~ $50/each = $600

Total Cost = $77,000
Total Resources = 86.4GHz CPU, 1152GB RAM

We’ve managed to potentially save $30,000, 2U of rack space, and a bunch of licencing in this scenario. There are some things to consider, however:

1) CPU requirements

What IF you couldn’t tolerate the CPU resource drop? If your environment is running at 80%+ CPU usage, then memory scaling isn’t going to help you. First thing to check though, would be if you have a bunch of VM’s that don’t have VMware tools installed and/or are constantly pinning the CPU because of some errant task. Fix that, you may find you don’t need the CPU you thought you did.

Upgrading to E5-2670 v2 2.5GHz 10 Core CPU’s brings the cost up to $13,500 per host or $2000 extra. But you go from 28.8GHz to 50GHz per host – or 150GHz for all 3 nodes. So you ‘only’ save $24,000 in this example then – still 22%

2) RAM gotchas

Check that populating all the memory slots doesn’t drop the memory speeds, as some configurations do. In this case, many environments I’ve seen would still ‘prefer’ to have MORE RAM than FASTER RAM. That may measure out to be untrue, but when internal customers are asking for double the memory per VM, they don’t care about how fast but how much. So you need to look at IF this will occur and IF you care.

3) N+1 sizing.

Remember you want to do maintenance. If you have 5 hosts and take out 1, you still have 4. If you have 3, you only have 2 left. Do you still have enough capacity. Let’s look at max and N+1 sizes:

5 Node = 144GHz CPU / 640GB or 115GHz / 512GB in maintenance

4 Node = 115GHz CPU / 1024GB or 86GHz / 768GB in maintenance

3 Node = 86GHz CPU / 1152GB or 57.6GHz / 768GB in maintenance.

So again, assuming RAM is your key resource, in either of the 4 or 3 node situations, you actually still exceed the 5 node cluster capacity at 100% health, while in maintenance.

4) Expandability

In my example, the 24 DIMM slot servers are maxed out from day one. The 5 node solution has 16 free slots, and could be upgraded as required. However, it has already paid $30,000 more, so that additional memory will only raise that delta. I’ve seen many environments where the attitude was “we’ll just add a host” – but fails to consider the licencing, ports, sockets, rack space, etc, required to “just add one more” – and it’s seldom just one more, when you start playing that game. I’ve seen environments where people had 10 hosts with 128GB or 192GB and wanted to add 5 more – rather than just replace the first 10 hosts with something with better density.

5) Lifetime maintenance costs

Now that you’ve managed to reduce the size of your cluster by 40%, that 40% is saved year after year. Consider all the things that might need to be done on a “per host” basis – patching, firmware updates, preventative maintenance, cabling, labelling, documentation, etc. Every time you do something “cluster wide”, you’ve likely reduced that workload by some amount.

This same example works just as well if your original cluster needed to be 10 or 15 nodes – and you chose to make it only 6 or 9. So this isn’t just a play for the “little guy” – and if environments supporting 1TB of RAM is “the little guy”….

Now, one thing I see, which is unfortunately part of the “corporate budget game” is the whole “well if we don’t use the budget, we’ll lose it” scenario. It drives me up the wall, but I get it, and it exists. So let’s assume you absolutely HAD to spend that original $107K, and burn through the $30K – and what could you do with it:

  • Spend some money on a DR “host” with DAS. It wouldn’t run everything, but it could be better than what you don’t have today
  • Maybe VSAN starts being a plausible option? One argument I tend to hear against it, is cost. At $7500/node, 3 is much less than 5 – and you could likely pay for your disks.
  • Host based caching like PernixData could be something you throw in, to drastically assist with storage IO performance
  • Maybe those 10GbE switches you’ve been wanting can finally be bought – heck, this basically makes them “buy the servers, the switches are free”
  • Training and consulting. Maybe you could now afford to send some guys on training. Or pay to get it installed if your team doesn’t have the skillsets in house

Something to keep in mind.

Categories: Dell, Design, Hardware, VMware

Dell announces 13G PowerEdge

September 8, 2014 Leave a comment

Everyone knows I like me some Dell rack mount servers, and the PowerEdge line was updated today it looks like. 

A collection of links if you will:

Dell’s Community TechCenter Wiki:
http://en.community.dell.com/techcenter/extras/w/wiki/7520.dell-13th-generation-poweredge-server-resources?dgc=SM&cid=259733&lid=5354034

Direct2Dell Blog Update:
http://en.community.dell.com/dell-blogs/direct2dell/b/direct2dell/archive/2014/09/08/today-39-s-the-day-links-to-blogs-videos-and-more-on-the-13th-generation-of-dell-poweredge-servers

Some of the major highlights that I find interesting:

  • Availability of the R630, R730, R730xd, T630, and M630 blade servers.
  • Intel E5 v3 Xeons, with up to 18 cores (up from 12 previously)
  • DDR4, and 24 DIMM slots (still, unfortunately – here’s hoping 32-64GB DDR4 starts becoming more readily available – and affordable)
  • 1.8” SATA SSD’s, providing 2.4x the IOPS compared to the same footprint 2.5” SSD’s.
  • PERC9 storage controllers
  • OEM partnership with SanDisk (FusionIO) for a ton of SSD acceleration software solutions (http://www.theregister.co.uk/2014/09/08/dell_stuffing_servers_with_sandisk_caching_software/)
  • iDRAC Direct for USB based updating of servers – previously only really available as a virtual ISO type of solution.
  • iDRAC Automatic Configuration, using a central repository

I’ve been very fond of the R610 and R620 solutions, and it looks like the R630 is a winner.  (More details at http://www.dell.com/us/business/p/poweredge-r630/pd).  I love to put these up against blade solutions, where density is important.  Yes, R730’s could be used in a 2U form factor, but if your other option is a blade server, truly you’re competing with a 1U highly dense server.  Let’s look at what you can do with an R630:

  • 2x Intel E5-2660 v3 10 Core/ 12 Thread CPU @ 2.6GHz and 105W.  Personally I’d prefer to use the E5-2650L v3 12c/24T, but at 1.8GHz I’ve run into too many pieces of software that ‘care’ about the speed of the CPU (I’m looking at you Cisco, with your WebEx/UC solutions that demand more CPU that it ever actually uses, and is barely supported in a virtual environment) that makes me want to suggest a higher GHz CPU
  • 24x 16GB DDR4 for 384GB total
  • Chassis with up to 24x 1.8” SSD.  Likely I wouldn’t use this, but one with the 10x 2.5” chassis and no optical.  But just imagine 24x 1.8” 200GB SSD’s as a VSAN….
  • 1x 120GB SSD – can’t hurt to have one around for VFRC or PernixData type solutions, and the configurator makes you pick at least one HDD.  At $260, with warranty, why not. 
  • 1x PERC H330 – don’t need anything fancy if we intend to be diskless
  • 4x Broadcom 57840S 10GbE SFP+ onboard, or upgrade to Emulex OneConnect OCm14104-U1-D port for $80 more (here’s hoping they’re better than they are in IBM servers, but I blame the IBM UEFI implementation for that)
  • Dual Hot Plug 750W Power Supply
  • Dual SD with 16GB SD cards and ESXi embedded
  • iDRAC8 Enterprise with vFlash 16GB
  • 3 Year Warranty

Web price comes to $18,162, and I’ve seen customers get 30-40% depending on time of  year, etc.  So let’s suggest 35% – That’s $12,000 or so for 20 cores, 384GB, and 4x 10GbE.  That’s a hell of a server in 1U.  24 months ago, I bought similar R620’s with 16 cores, 256GB, 2x 10GbE LOM for $9700 – and these would get significantly better performance as those weren’t even v2 E5’s vs v1’s.  Also so much better than 12-16 DIMM slot blade servers that just don’t get any real density!

Now if you’re doing any sort of VSAN, or even just have local reasons to want DAS solutions, check out the R730XD server – especially this one:

At first glance, I thought “great, another 8x 3.5” chassis, with wasted space for vents”.  Look closer.  The top 1/3 is 18x 1.8” SSD.  Granted, at some point your PCIe based RAID controller would be your bottleneck, I’m sure, vs PCIe SSD’s – but that’s largely for throughput and not IOPS.  I have no doubt that the above solution could drive a ton of performance.  It is a bit of a shame they can still only drive 2x NVidia K2 Grid cards, but that’s largely a problem with 2U servers in general.  Still, if it could have done 3… that would have been even more amazing.

I can’t wait to get my hands on some of these, I’m pretty excited to actually get to use some of these features!

Categories: 13G, Dell

HOWTO: Using Dell iDRAC 7 Lifecycle Controller 2 to update Dell PowerEdge R420, R620, and R720 servers

April 3, 2014 3 comments

Newer Dell Servers have an option as part of their iDRAC v7 (Dell Remote Assistance Controller) to include a “Lifecycle Controller” v2.  This feature can be utilized to provide updates to all firmware on a Dell server with minimal stress and interaction.  However – with a non v1.3.0.x Lifecycle Controller firmware, there is no ability to set a VLAN on the NIC’s. These NIC’s are the LOM (LAN Onboard Modules), and in my vSphere environment, these are trunked from the switch to allow VLAN tagging at the host. So when you reboot to the LCC, with no VLAN tagging option, you’re going to find that the “Update via FTP/Network Share” option to not work as well as you might hope. Whether you might be doing this manually/interactively, or via the Dell vSphere Open Manage Integration plug-in, it’s going to require you to fix this first.

This will guide you through the process of doing an ISO based LCC update of all current updates. After that, you’ll be able to do your next round via network – which I’ll cover in another post.

Prerequisites/Assumptions:

1) You have acquired the latest “Q# Server Update Utility DVD ISO” – currently (as of March 11 2014) v7.4 dated 2/1/2014 – http://www.dell.com/support/drivers/us/en/19/DriverDetails/Product/poweredge-r720?driverId=4V8PP&osCode=WS8R2&fileId=3338639762&languageCode=en&categoryId=SM

NOTE: if you do NOT have this ISO, expect it to take 3-4 hours to download even at 500KB/sec, as it is 8.4GB in size

2) You will NOT be using Network based Lifecycle Controller updates with a central FTP or SMB based share, and you will NOT be using Dell Repository Manager to create said repository, but using the above SUU media.

3) The existing iDRAC on the ESXi host in question is functioning normally – allowing both Remote Console and Virtual Media

4) You are able to place the ESXi host(s) into maintenance – either with zero downtime with DRS in a cluster or in a maintenance window for a standalone host (as you will be rebooting the host, so no VM’s or host operations can occur during the update)

Process:

1) Place the host in Maintenance Mode in vSphere and evacuate all VM’s.  Manually resolve any VM migration issues as required.

2) Connect to the iDRAC while the host is entering Maintenance Mode and Launch the Virtual Console.

clip_image002

3) From the Virtual Console, change NEXT BOOT to LIFECYCLE CONTRLLER:

clip_image004

4) From the Virtual Console, click on VIRTUAL MEDIA -> LAUNCH VIRTUAL MEDIA:

clip_image006

5) Click ADD IMAGE, and locate your downloaded ISO:

clip_image008

Check the box for MAPPED to ensure the ISO is mapped to the host

6) By now, your ESXi host should be in maintenance mode.  If so, right click and reboot the host.

7) When the server reboots, it will boot automatically into the Lifecycle Controller, likely to the Network Setup screen:

clip_image010

Note that while you can pick any of the add-in or onboard NIC’s, you are not able to select the iDRAC NIC and/or are you able to specify VLAN ID.  As our vSphere facing ports are all Trunked and require VLAN tag’s, this prevents us from using network based lifecycle updates.  It is likely that this oversight is corrected in an updated version of the Lifecycle Controller.  If so, documentation will be updated to reflect this.  Click CANCEL to exit the Lifecycle Controller Network Setup.

8) From the Lifecycle Controller 2 HOME screen, click FIRMWARE UPDATE:

clip_image012

9) Choose LAUNCH FIRMWARE UPDATE:

clip_image014

10) On Step 1 of 3, choose LOCAL DRIVE (CD or DVD or USB):

clip_image016

Click NEXT.

11) On Step 2 of 3, choose the local (VIRTUAL CD) drive:

clip_image018

Click NEXT.

12) Wait while it VERIFIES SELECTION:

clip_image020

13) You will now see a list of updates for items that are present and available.  This will display both their current and available versions. 

clip_image022

I would recommend installation of ALL available updates to the most current version.  Note that it indicates “System will reboot after selected updates are applied”.   What is convenient about this is we chose the “Next Boot” option to get into the Lifecycle Controller, thus when it reboots, it should reboot normally back into ESXi.   Click APPLY.

14) You will then see it copying the updates to the local flash so it can perform the updates without the Virtual Media.

clip_image024

clip_image026

The next screen shows the AUTOMATED TASK APPLICATION, which shows the progress of the update(s).

clip_image028

Note that prior to the completion of all updates, the system automatically rebooted.  Likely due to update of the Lifecycle Controller itself.  It then automatically selected “ENTERING LIFECYCLE CONTROLLER” as shown, to continue the update process.

clip_image030

And the remaining updates continued as expected…

clip_image032

Understandably, I was surprised to see the system eventually restart back into the Lifecycle Controller. 

15) If you now rescan for FIRMWARE UPDATES from the Virtual Media ISO, you should see that Current matches Available and all components are unchecked as they do not require updates. 

clip_image034

This confirms that we are largely done with the update process.

16) Return to the main menu and enter SETTINGS -> NETWORK SETTINGS:

clip_image036

clip_image038

You will now note that there is a VLAN setting.  This will allow us to utilize the network vs Virtual Media to perform later updates.  Equally, as the Lifecycle Controller can then be network reached, this update process can largely be updated via the Dell vSphere vCenter Server Integration tools and/or Dell Open Manage Express server. 

clip_image040

You should see this screen if the VLAN/DHCP settings worked as expected. 

As this HOWTO is intended to cover performing the updates via Virtual Media ISO, this is where we will stop for now.  A later document will cover how to use network based Lifecycle Controller updates as well as automation with Dell Open Manage Essentials (DOME).  Reboot the server and verify it boots as normally intended.

clip_image042

Press ESC and then YES to Exit and Reboot.

Sometime soon, I’ll post how to create a Dell SUU network repository, and then how to perform a LCC controller update interactively using said network location.