Archive

Archive for the ‘Hardware’ Category

Modifying the Dell C6100 for 10GbE Mezz Cards

June 11, 2015 5 comments

In a previous post, Got 10GbE working in the lab – first good results, I talked about getting 10GbE working with my Dell C6100 series.  Recently, a commenter asked me if I had any pictures of the modifications I had to make to the rear panel to make these 10GBE cards work.  As I have another C6100 I recently acquired (yes, I have a problem…), that needs the mods, it seems only prudent to share the steps I took in case it helps someone else.

First a little discussion about what you need:

  • Dell C6100 without the rear panel plate to be removed
  • Dell X53DF/TCK99 2 Port 10GbE Intel 82599 SFP+ Adapter
  • Dell HH4P1 PCI-E Bridge Card

You may find the Mezz card under either part number – it seems that the X53DF replaced the TCK99.  Perhaps one is the P/N and one is the FRU or some such.  But you NEED that little PCI-E bridge card.  It is usually included, but pay special attention to the listing to ensure it does.  What you DON’T really need, is the mesh back plate on the card – you can get it bare. 

2015-06-11 21.18.132015-06-11 21.17.46

Shown above are the 2pt 10GbE SFP+ card in question, and also the 2pt 40GbE Infiniband card.  Above them both is the small PCI-E bridge card.

2015-06-11 21.19.24

You want to remove the two screws to remove the backing plate on the card.  You won’t be needing it, and you can set it aside.  The screws attach through the card and into the bracket, so once removed, reinsert the screws to the bracket to keep from losing them.

2015-06-11 21.17.14

Here we can see the back panel of the C6100 sled.  Ready to go for cutting.

2015-06-11 21.22.232015-06-11 21.24.48

You can place the factory rear plate over the back plate.  Here you can see where you need to line it up and mark the cuts you’ll be doing.  Note that of course the bracket will sit higher up on the unit, so you’ll have to adjust for your horizontal lines. 

2015-06-11 21.23.092015-06-11 21.22.49

If we look to the left, we can see the source of the problem that causes us to have to do this work.  The back panel here is not removable, and wraps around the left corner of the unit.  In systems with the removable plate, this simply unscrews and panel attached to the card slots in.  In the right hand side you can see the two screws that would attach the panel and card in that case.

2015-06-11 21.35.38

Here’s largely what we get once we complete the cuts.  Perhaps you’re better with a Dremel than I am. Note that the vertical cuts can be tough depending on the size of the cutting disk you have, as they may have interference from the bar to remove the sled. 

2015-06-11 21.36.162015-06-11 21.36.202015-06-11 21.36.28

You can now attach the PCI-E bridge card to the Mezz card, and slot it in.  I found it easiest to come at about 20 degree angle and slot in the 2 ports into the cut outs, then drop the PCI-E bridge into the slot.  When it’s all said and done, you’ll find it pretty secure and good to go.

That’s really about it.  Not a whole lot to it, and if you have it all in hand, you’d figure it out pretty quick.  This is largely to help show where my cut lines ended up compared tot he actual cuts and where adjustments could be made to make the cuts tighter if you wanted.  Also, if you’re planning to order, but are not sure if it works or is possible, then this is going to help out quite a bit.

Some potential vendors I’ve had luck with:

http://www.ebay.com/itm/DELL-X53DF-10GbE-DUAL-PORT-MEZZANINE-CARD-TCK99-POWEREDGE-C6100-C6105-C6220-/181751541002? – accepted $60 USD offer.

http://www.ebay.com/itm/DELL-X53DF-DUAL-PORT-10GE-MEZZANINE-TCK99-C6105-C6220-/181751288032?pt=LH_DefaultDomain_0&hash=item2a513890e0 – currently lists for $54 USD, I’m sure you could get them for $50 without too much negotiating.

Categories: C6100, Dell, Hardware, Home Lab

IBM RackSwitch–40GbE comes to the lab!

May 20, 2015 3 comments

Last year, I had a post about 10GbE coming to my home lab (https://vnetwise.wordpress.com/2014/09/20/ibm-rackswitch10gbe-comes-to-the-lab/).  This year, 40GbE comes! 

This definitely falls into the traditional “too good to pass up” category.  A company I’m doing work for picked up a couple of these, and there was enough of a supply that I was able to get my hands on a pair for a reasonable price.  Reasonable at least after liquidating the G8124’s from last year.  (Drop me a line, they’re available for sale! Smile)

Some quick high level on these switches, summarized from the IBM/Lenovo RedBooks (http://www.redbooks.ibm.com/abstracts/tips1272.html?open):

  • 1U Fully Layer 2 and Layer 3 capable
  • 4x 40Gbe QSFP+ and 48x 10GbE SFP+
  • 2x power supply, fully redundant
  • 4x fan modules, also hot swappable.
  • Mini-USB to serial console cable (dear god, how much I hate this non-standard part)
  • Supports 1GbE Copper Transceiver – no issues with Cisco GLC-T= units so far
  • Supports Cisco Copper TwinAx DAC cabling at 10GbE
  • Supports 40GbE QSFP+ cables from 10GTek
  • Supports virtual stacking, allowing for a single management unit

Front panel of the RackSwitch G8264

Everything else generally falls into line with the G8124.  Where those are listed as “Access” switches, these are listed as “Aggregation” switches.  Truly, I’ll probably NEVER have any need for this many 10GbE ports in my home lab, but I’ll also never run out.  Equally, I now have switches that match production in one of my largest environments, so I can get good and familiar with them.

I’m still on the fence about the value of the stacking.  While these are largely going to be used for ISCSI or NFS based storage, stacking may not even be required.  In fact there’s an argument to be made about having them be completely segregated other than port-channels between them, so as to ensure that a bad stack command doesn’t take out both.  Also the Implementing IBM System Networking 10Gb Ethernet Switches guide, it shows the following limitations:

When in stacking mode, the following stand-alone features are not supported:
Active Multi-Path Protocol (AMP)
BCM rate control
Border Gateway Protocol (BGP)
Converge Enhanced Ethernet (CEE)
Fibre Channel over Ethernet (FCoE)
IGMP Relay and IGMPv3
IPv6
Link Layer Detection Protocol (LLDP)
Loopback Interfaces
MAC address notification
MSTP
OSPF and OSPFv3
Port flood blocking
Protocol-based VLANs
RIP
Router IDs
Route maps
sFlow port monitoring
Static MAC address addition
Static multicast
Uni-Directional Link Detection (UDLD)
Virtual NICs
Virtual Router Redundancy Protocol (VRRP)

That sure seems like a lot of limitations.  At a glance, I’m not sure anything there is end of the world, but it sure is a lot to give up. 

At this point, I’m actually considering filling a number of ports with GLC-T’s and using that for 1GbE.  A ‘waste’, perhaps, but if it means I can recycle my 1GbE switches, that’s an additional savings.  If anyone has a box of them they’ve been meaning to get rid of, I’d be happy to work something out. 

Some questions that will likely get asked, that I’ll tackle in advance:

  • Come on, seriously – they’re data center 10/40GbE switches.  YES, they’re loud.  They’re not, however, unliveable.  They do quite down a bit after warm up, where they run everything at 100% cycle to POST.  But make no mistake, you’re not going to put one of these under the OfficeJet in your office and hook up your NAS to it, and not shoot yourself. 
  • Power is actually not that bad.  These are pretty green, and drop power to unlit ports.  I haven’t hooked up a Kill-a-Watt to them, but will tomorrow.  They’re on par with the G8124’s based on the amp display on the PDU’s I have them on right now. 
  • Yes, there are a couple more Winking smile  To give you a ballpark, if you check eBay for a Dell PowerConnect 8024F and think that’s doable – then you’re probably going to be interested.  You’d lose the 4x10GBaseT combo ports, but you’d gain 24x10GbE and 4x 40GbE.
  • I’m not sure yet if there are any 40GbE compatible HBA – just haven’t looked into it.  I’m guessing Mellanox ConnectX-3 might do it.  Really though, even at 10GbE, you’re not saturating that without a ton of disk IO. 

More to come as I build out various configurations for these and come up with what seems to be the best option for a couple of C6100 hosts. 

Wish me luck!

Categories: Hardware, Home Lab, IBM, RackSwitch

Design Exercise–Scaling Up–Real World Example

October 13, 2014 Leave a comment

My previous post on Design Exercise- Scaling up vs Scaling out appeared to be quite popular. A friend of mine recently told me of an environment, and while I have only rough details of it, it gives me enough to make a practical example of a real world environment – which I figured might be fun. He indicated that while we’d talked about the ideas in my post for years, it wasn’t until this particular environment that it really hit home.

Here are the highlights of the current environment:

  • Various versions of vSphere – v3.5, v4.x, v5.x, multiple vCenters
  • 66 hosts – let’s assume dual six core Intel 55xx/56xx (Nahelem/Westermere) CPU’s
  • A quick tally suggests 48GB of RAM per host.
  • These hosts are blades, likely HP. 16 Blades per chassis, so at least 4 chassis. For the sake of argument, let’s SAY it’s 64 hosts, just to keep it nice and easy.
  • Unknown networking, but probably 2x 10GbE, and 2x 4Gbit/FC, with passthru modules

It might be something very much like this. In which case, it might be dual 6 core CPU’s, and likely only using 1GbE on the front side. This is probably a reasonable enough assumption for this example, especially since I’m not trying to be exact and keep it theoretical.

http://www.ebay.ca/itm/HP-c7000-Blade-Chassis-16x-BL460c-G6-2x-6-C-2-66GHz-48GB-2x-146GB-2x-Gbe2c-2x-FC-/221303055238?pt=COMP_EN_Servers&hash=item3386b0a386

I’ve used the HP Power Advisor (http://www8.hp.com/ca/en/products/servers/solutions.html?compURI=1439951#.VDnvBfldV8E) to determine the power load for a similarly configured system with the following facts:

  • 5300 VA
  • 18,000 BTU
  • 26 Amps
  • 5200 Watts total
  • 2800 Watts idle
  • 6200 Watts circuit sizing
  • 6x 208V/20A C19 power outlets
    clip_image001

We’ll get to that part later on. For now, let’s just talk about the hosts and the sizing.

Next, we need to come up with some assumptions.

  • The hosts are likely running at 90% memory and 30% CPU, based on examples I’ve seen. Somewhere in the realm of 2764GB of RAM and 230 Cores.
  • The hosts are running 2 sockets of vSphere Enterprise Plus, with SnS – so we have 128 sockets of licences. There will be no theoretical savings on net-new licences as they’re already owned – but we might save money on SnS. There is no under-licencing that we’re trying to top up.
  • vSphere Enterprise Plus we’ll assume to be ~ $3500 CAD/socket and 20% for SnS or about $700/year/socket.
  • The hosts are probably not licenced for Windows Data Center, given the density – but who knows. Again, we’re assuming the licences are owned, so no net-new savings but there might be on Software Assurance.
  • We’re using at least 40U of space, or a full rack for the 4 chassis
  • We’re using 20,800 Watts or 21 kWhr
  • While the original chassis are likely FC, let’s assume for the moment that it’s 10gbE ISCSI or NFS.

Now, let’s talk about how we can replace this all – and where the money will come from.

I just configured some Dell R630 1U Rack servers. I’ve used two different memory densities to deal with some cost assumptions. The general and common settings are:

  • Dell R630 1U Rack server
  • 2x 750 Watt Power Supply
  • 1x 250GB SATA just have “a disk”
  • 10 disk 2.5” chassis – we won’t be using local disks though.
  • 1x PERC H730 – we don’t need it, but we’ll have it in case we add disks later.
  • Dual SD module
  • 4x Emulex 10GbE CNA on board
  • 2x E5-2695 v3 2.3GHz 14C/28T CPU’s

With memory we get the following numbers:

  • 24x 32GB for 768GB total – $39.5 Web Price, assume a 35% discount = $26K
  • 24x 16GB for 368GB total – $23.5 Web Price, assume a 35% discount = $15.5K

The first thing we want to figure out is if the memory density is cost effective. We know that 2x of the 384GB configs would come to $31K or $6K more than the 2x servers. So even without bothering to factor for licencing costs, we know it’s cheaper. If you had to double up on vSphere, Windows Data Center, Veeam, vCOPS, etc, etc, then it gets worse. So very quickly we can make the justification to only include the 768GB configurations. So that’s out of the way. However, it also tells us that if we need more density, we do have some wiggle room to spend more on better CPU’s with more cores/speeds – we can realistically spend up to $3K/CPU more and still work out to be the same as doubling the hosts with half the RAM.

Now how many will we need? We know from above “Somewhere in the realm of 2764GB of RAM and 230 Cores”. 230 cores / 28 cores per server means we need at least 8.2 hosts – we’ll assume 9.; 2764GB of RAM, only requires 3.6 hosts. But we also need to assume we’ll need room for growth. Based on these numbers, let’s work with the understanding we’ll want at least 10 hosts to give us some overhead on the CPU’s, and room for growth. If we’re wrong, we have lots of spare room for labs, DEV/TEST, finally building redundancy, expanding poorly performing VM’s, etc. No harm In that. This makes the math fairly easy as well:

  • $260K – 10x Dell R630’s with 768GB
  • $0 – licence savings from buying net new

We’ve now cost the company, $260K, and so far, haven’t shown any savings or justification. Even just based on hardware refresh and lifecycle costs, this is probably a doable number. This is $7.2K/month over 36 months.

What if we could get some of that money back? Let’s find some change in the cushions.

  • Licence SnS savings. We know we only need 20 sockets now to licence 10 hosts, so we can potentially let the other 108 sockets lapse. At $700/socket/year this results in a savings of $75,600 per year, or $227K over 36 months. This is 87% of our purchase cost for the new equipment. We only need to find $33K now
  • Power savings.
    clip_image002
    The Dell Energy Smart Solution Advisor (http://essa.us.dell.com/dellstaronline/Launch.aspx/ESSA?c=us&l=en&s=corp) suggests that each server will require 456Watts, 2.1 Amps and 1600 BTU of cooling. So our two solutions look like
    clip_image003
    I pay $0.085/kWhr here so I’ll use that number. In the co-location facilities I’m familiar with, you’re charged per power whip not usage. But as this environment is on site, we can assume they’re being charged only as used.
    We’ve now saved another $1K/month or $36K over 36 months. We have saved $263K on a $260K purchase. How am I doing so far?
  • I

  • Rack space – we’re down from 40U to 10U of space. Probably no cost savings here, but we can reuse the space
  • Operational Maintenance – we are now doing Firmware, Patching, Upgrades, Host Configuration, etc, across 10 systems vs 64. Regardless of if that time accounts for 1 or 12 hours per year per server, we are now doing ~ 84% less work. Perhaps now we’ll find the time to actually DO that maintenance.

So based on nothing more than power and licence *maintenance*, we’ve managed to recover all the costs. We also have drastically consolidated our environment, we can likely “finally” get around to migrating all the VM’s into a single vSphere v5.5+ environment and getting rid of the v3.5/v4.x/etc mixed configuration that likely was left that way due to “lack of time and effort”.

We also need to consider the “other” ancillary things we’re likely forgetting as benefits. Everyone one of these things that a site of this size might have, represents a potential savings – either in net-new or maintenance:

  • vCloud Suite vs vSphere
  • vCOPS
  • Veeam or some other backup product, per socket/host
  • Window Server Data Center
  • SQL Server Enterprise
  • PernixData host based cache acceleration
  • PCIe/2.5” SSD’s for said caching

Maybe the site already has all of these things. Maybe they’re looking at it for next year’s budget. If they have it, they can’t reduce their licences, but could drop their SnS/Maintenance. If they’re planning for it, they now need 84% less licencing. My friends in sales for these vendors won’t like me very much for this, I’m sure, but they’d also be happy to have the solution be sellable and implemented and a success story – which is always easier when you don’t need as many.

I always like to provide more for less. The costs are already a wash, what else could we provide? Perhaps this site doesn’t have a DR site. Here’s an option to make that plausible:

  • $260K – 10x R630’s for the DR site
  • $0K – 20 sockets of vSphere Enterprise – we’ll just reuse some of the surplus licencing. We will need to keep paying SnS though.
  • $15K – 20 sockets of vSphere Enterprise SnS
  • $40K – Pair of Nexus 5548 switches? Been a while since I looked at pricing
    Spend $300K and you have most of a DR environment – at least the big part. You still have no storage, power, racks, etc. But you’re far closer. This is a much better use of the same original dollars. The reason for this part of the example is because of the existing licences and we’re not doing net-new. The question of course from the bean-counters will be “so what are we going to do, just throw them away???”

Oh. Right. I totally forgot. Resale J

http://www.ebay.ca/itm/HP-C7000-Blade-Enclosure-16xBL460C-G6-Blades-2xSix-Core-2-66GHZ-X5650-64GB-600GB-/271584371114?pt=COMP_EN_Servers&hash=item3f3bb0a1aa

There aren’t many C7000/BL460C listed as “Sold” on eBay, but the above one sold for ~ $20K Canadian. Let’s assume you chose to sell the equipment to a VAR that specializes in refurbishing – they’re likely to provide you with 50% of that value. That’s another $10K/chassis or $40K for the 4 chassis’.

As I do my re-read of the above, I realize something. We need 9 hosts to meet CPU requirements, but we’d end up with 7680GB of RAM where we only really require 2764GB today. This brings the cost down to ~ $31K Web Price or $20K with 35% discount. At a savings of $6K/server, we’d end up with 5120GB of RAM – just about double what we use today, so lots of room for scale up. We’ll save another $60K today. In the event that we ever require that capacity, we can easily purchase the 8*32GB/host at a later date – and likely at a discount as prices drop over time. However – often the original discount is not applied to parts and accessory pricing for a smaller deal, so consider if it actually is a savings. How would you like a free SAN? J Or 10 weeks of training @ $6K each? I assume you have people on your team who could benefit from some training? Better tools? Spend your money BETTER! Better yet, spend the money you’re entrusted to be the steward of, better – it’s not your money, treat it with respect.

A re-summary of the numbers:

  • +$200K – 10x R630’s with 512GB today
  • +$0K – net-new licencing for vSphere Enterprise Plus
  • -$227K – 108 sockets of vSphere SnS we can drop, over 3 years.
  • -$36K – Power savings over 3 years
  • -$40K – Resale of the original equipment

Total: $103K to the good.

 

Footnote: I came back thinking about power.  The Co-Location facility I’ve dealt with charges roughly:

  • $2000/month for a pair of 208V/30A circuits
  • $400/month for a pair of 110V/15A circuits
  • $Unknown for a pair of 20A circuits, unfortunately.

I got to thinking about what this environment would need – but also what it has.  In my past, I’ve seen a single IBM Blade Center chassis using 4x 208V/30A circuits, even if it could have been divided up better.  So let’s assume the same inefficiency was done here.  Each HP C-Series chassis at 25.4A would require 3x Pairs, or 12x Pairs for the total configuration – somewhere in the area of $24,000/month in power.  Yikes!  Should it be less?  Absolutely.  But it likely isn’t, based on the horrible things I’ve seen – probably people building as though they’re charged by usage and not by drop.

The 10x Rack servers if I switch them to 110V vs 208V indicate they need the 3.5A each – which is across both circuits..  This I think is at max, but let’s be fair and say you wouldn’t put more than 3x (10.5A) on a 15A circuit.  So you need 4x $400 pairs for $1600/month in power.  Alternatively, you could put them all on a 208V/30A pair for 21A total, for $2000/month.  If you could, this would be the better option as it lets you use only one pair of PDU’s, and you have surplus for putting in extra growth, Top of Rack switching, etc. 

So potentially, you’re also going to go from $24K to $2K/month in power.  For the sake of argument, let’s assume I’m way wrong on the blades, and it’s using half the power or $12K.  You’re still saving $10K/month – or $360K over 36 months.  Did you want a free SAN for your DR site maybe?  Don’t forget to not include the numbers previously based on usage vs drop power, or your double dipping on your savings. 

(New) Total: $427K to the good – AFTER getting your new equipment. 

Hi.  I just saved you half a million bucks

Categories: Dell, Design, Hardware, VMware

Design Exercise-Fixing Old or Mismatched Clusters

October 2, 2014 Leave a comment

In two previous posts, I talked about some design examples I’ve seen:

Design Exercise – Scaling up vs Scaling out

Design Exercise – DR Reuse

Today I’m going to talk about the “No problem, we’ll just add a host” problem.  But not in the “one more of the same” scenario, instead a “we can’t get those any longer, so we’ll add something COMPLETELY different” scenario.

Regardless of if the current site is something like previously described with matching systems (eg: 4x Dell PE2950’s) or random systems, often when capacity runs out, budget is likely low, and so the discussion comes up to “Just add a host”. But as we know from previous examples, adding additional hosts costs money for not only hardware, but licences. I have two different example sites to talk about:

Example 1:

  • 4x Dell PE2950, 2x 4 Core, 32GB RAM, 4x 1GBE hosts

Example 2:

  • 1x Dell T300, 1x 4 Core, 32GB RAM, 2x 1GbE
  • 1x HP DL380 G6, 2x 4 Core, 64GB RAM, 4x 1GbE
  • 1x Dell R610, 2x 6 Core, 96GB RAM, 4x 1GbE

In both cases, we’ll assume that the licencing won’t change as we’re not going to discuss actually adding any hosts, so all software/port counts remain the same.

As you can see, neither environment is particularly good. They both old, but Example 2 is horribly mismatched. DRS is going to have a hell of a time finding proper VM slots to use, the capacity is mis-matched, and nothing is uniform. The options to fix this all involve investing good money after bad. But often an environment that is this old or mismatched, likely ended up this way due to lack of funds. We can talk about proper planning and budgeting until we’re blue in the face, but what we need to do right now is fix the problem. So let’s assume that even if we could add or replace one of the hosts with something more current, like your $7000 R620 with 2x 6 Core and 128GB, this is not in budget. Certainly, 3-4 of them is not, and certainly not for the bigger/better systems at $10K+.

So what if we go used? Ah, I can hear it now, the collective rants of a thousand internet voices. “But we can’t go used, it’s old and it might fail, and it’s past it’s prime”. Perhaps – but look at what the environments currently are. Plus, if someone had something ‘newer’, that they’d owed for 2 years into a 3-5 year warranty, it would be “used” as well, no? Also, accepting for complete and spontaneous host failures, virtualization and redundancy gives us a lot of ways to mitigate actual hardware failures. Failing network ports, power supplies, fans, etc, will all trigger a Host Health alert. This can be used to automatically place the host in Maintenance Mode and have DRS evacuate it and send you an e-mail. So yes, a part may fail, but we build _expecting_ that to be true.

Now assume that the $7000 option for a new host *IS* in budget. What could we do instead? We certainly don’t want to add a single $7000 host to the equation, for all the reasons noted. Now we look into what we can do with off-lease equipment. This is where being a home-labber has its strengths – we already know what hardware is reliable and plentiful, and still new enough to be good and not quite old enough to be a risk.

What if I told you that for about $1500 CAD landed, you could get the following configuration:

Example 1 can now, for around $6000 CAD, replace all 4 hosts with something newer, that will have 16 more cores, and 4x the RAM. It’s not going to be anywhere near the solution from the other day with the 384GB hosts – but it’s also not going to be $40K in servers. Oh, plus 8U to 4U, power savings, etc.

Example 2 is able to replace those first 2 hosts and standardize, for around $3000.

In either case, they’re still “older” servers. A Dell R610 is circa 2009-2012, so you’re still looking at a 2-5 year old server at this point – which might be a little long in the tooth. But if the power is enough for you, and you’re just trying to add some capacity and get out of “scary old” zone, it might not be so bad. Heck, either of these sites are likely going to be very happy with the upgrades. Questions will need to be answered such as:

  • Lifespan – how long are we expecting these servers to be a solution for? Till the end of next calendar year or about 16-18 months? That’s fine.
  • Budget – are we doing this because we have run out of budget for this year but *NEED* “something”? Has next year’s budget been locked away and this was ‘missed’, but you still need ‘something’?  If we assume these are 18 month solutions, to get us from now (Oct 2014) to “after next budget year” (Jan 2016), then Example 1 is $333/month and Example 2 is $167/month. Money may be tight, but that’s a pretty affordable way of pushing off the reaper.  Heck, I know people with bigger cell phone bills.
  • Warranty – these may or may not come with OEM warranty. Are you okay with that? Maybe what makes the most sense is just pick up an extra unit for “self-warranty” – it is almost certainly still cheaper than extending the OEM warranty. Remember though, OEM support also helps troubleshoot weird issues and software incompatibilities, etc. Self-warrantying just gets you hard parts, that you can swap – if you have time and energy to do so. Check if the secondary market reseller will over next day parts, that may be sufficient for you. Also, check if the vendor of the hardware you’re choosing will allow you to download software updates (eg: management software, firmware, BIOS, etc) without a service contract. Dell, at this point, still does, which is why I like them (for customers and my lab).  Oh, an advantage of the extra unit for “self-warranty”?  You can use it for Dev/Test, learning, testing things you want to try, validating hardware configurations, swapping parts for testing suspected issues, etc.
  • Other Priorities – do you need to spend the same money you’d spend on new hosts, elsewhere? Maybe you need a faster SAN today, because you’re out of capacity as well, and you have to make a choice. You can fix it next year, but you can’t fix both at once, regardless of effort or good intentions. Maybe you want to go to 10GbE switches today in preparation. Perhaps you want to spend the same money on training, so that your staff can “do more with less” and have “smarter people” instead of “more thingies, with no one to run them”.

I fully realize that off-lease, eBay, secondary market is going to throw up automatic “no’s” for a lot of people. Also, many management teams will simply say no. Some will have an aversion to “buying from eBay” – fine, call the vendor from their eBay auction, and get a custom quote with a PO directly, and but it just like you would from any other VAR. The point of the matter is, you have options, even if you’re cash strapped.

BTW, if anyone was thinking “why not just get R620’s” which are newer, you certainly could – http://www.ebay.ca/itm/DELL-POWEREDGE-R620-2-x-SIX-CORE-E5-2620-2-0GHz-128GB-NO-HDD-RAILS-/111402343301?pt=COMP_EN_Servers&hash=item19f018db85. One can get an R620, 2x 6 Core E5-2620, 128GB RAM (16x8GB almost certainly, but 24 DIMM slots), 4x 1GbE, iDRAC, etc, for about $3000. This would give you more room to grow and is newer equipment, but it starts getting much closer to the $7000 configuration direct from Dell with 3 year warranty, 10GbE ports, etc. Still, 4x $3K is much less than 4x $7K, and $16,000 is a lot of money you could spend on something else. Just watch you’re not paying too close to retail for it to be not be worth it.

The trick, coming from a home-lab guy, is to be “just old enough to not be worth any money to someone else” but “just new enough to still be really useful, if you know what you’re doing.”

Also, consider these options for the future.  Remember that ROI involves a sale.  Let’s say you purchased the brand new $7000 servers and made it 5 year warranty vs 3 year for… 20% more or about $8500.  You’re almost certainly not going to use it for 5 years.  But in 2.5 years, when you want to put that server on the secondary market, and it still has 2+ years of OEM warranty left – you’re going to find it has significantly more resale value.

This is no different than leasing the ‘right’ car with the ‘right options’, because you know it’ll have a higher resale value at the end of the lease.  If you’re the kind of person that would never “buy new, off the lot” and would always buy a “1-2 year old lease-return, so someone else can pay the depreciation” – this solution is for you.

If in one scenario you haul the unit away to recycling (please, call me, I offer this service for free Smile), and another you sell the equipment to a VAR for $2000/unit you can use as credit on your next purchase or services…

Categories: Dell, Design, Hardware, VMware, vSphere

Got 10GbE working in the lab–first good results

October 2, 2014 12 comments

I’ve done a couple of posts recently on some IBM RackSwitch G8124 10GbE switches I’ve picked up.  While I have a few more to come with the settings I finally got working and how I figured them out, I have had some requests from a few people as to how well it’s all working.   So a very quick summary of where I’m at and some results…

What is configured:

  • 4x ESXi hosts running ESXi v5.5 U2 on a Dell C6100 4 node
  • Each node uses the Dell X53DF dual 10GbE Mezzanine cards (with mounting dremeled in, thanks to a DCS case)
  • 2x IBM RackSwitch G8124 10GbE switches
  • 1x Dell R510 Running Windows 2012 R2 and StarWind SAN v8.  With both an SSD+HDD VOL, as well as a 20GB RAMDisk based VOL.  Using a BCM57810 2pt 10GbE NIC
    Results:
    IOMeter against the RAMDisk VOL, configured with 4 workers, 64 threads each, 4K 50% Read/50% Write, 100% Random:

image

StarWind side:

image

Shows about 32,000 IOPS

And an Atto Bench32 run:

image

Those numbers seem a little high.

I’ll post more details once I’ve had some sleep, I had to get something out, I was excited Smile

Soon to come are some details on the switches, for ISCSI configuration without any LACP other than for inter-switch traffic using the ISL/VLAG ports, as well as a “First time, Quick and Dirty Setup for StarWind v8”, as I needed something in the lab that could actually DO 10GbE, and  had to use SSD and/or RAM to get it to have enough ‘go’ to actually see if the 10GbE was working at all.

I wonder what these will look like with some PernixData FVP as well…

UPDATED – 6/10/2015 – I’ve been asked for photos of the work needed to Dremel in the 10GbE Mezz cards on the C6100 server – and have done so!  https://vnetwise.wordpress.com/2015/06/11/modifying-the-dell-c6100-for-10gbe-mezz-cards/

Design Exercise–DR or Dev/Test Re-use

October 1, 2014 1 comment

In a previous post, I recently discussed some of the benefits of Scaling Up vs Scaling Out (https://vnetwise.wordpress.com/2014/09/28/design-exercise-scaling-up-vs-scaling-out/) and how you can save money by going big. In that example, the site already had 4 existing hosts, wanted 5 new ones, but settled on 3. We can all guess of course what the next thing to get discussed was, I’m sure…

“So let’s reuse the old 4 hosts, because we have them, and use them… for DR or a DEV/TEST environment”. This should be no surprise, that “because we have them” is a pretty powerful sell. Let’s talk about how that might actually cost you considerably more money than you should be willing to spend.

Just as quick reminder, a summary of the hardware and configurations in question:

OLD HOSTS: Dell PowerEdge 2950 2U, 2x E5440 2.8GHz 4 Core CPU, 32GB DDR2

NEW HOSTS: Dell PowerEdge R620 1U, 2x E5 2630L 2.4GHz 6 Core CPU, 384GB DDR3

1) Licencing

Our example assumes that the site needed new licencing for the new hardware – either it didn’t have any, it expired, it was the wrong versions, who knows. So if you reutilize those 4 hosts, you’re going to need 4-8 licences for everything. Assuming the same licence types and versions (eg: vSphere Enterprise Plus, Windows Server Data Center, Veeam Enterprise, etc. ) that works out to be:

  • 4x $0 hosts as above = $0
  • 8x Sockets of vSphere Enterprise Plus @ ~ $3500/each with SnS = $28,000
  • 4x Windows Server Data Center licences @ ~ $5000/each = $20,000
  • 8x Sockets of Veeam B&R licences @ ~ $1000/each = $8,000

Total Cost = $56,000

Total Resources = 89GHz CPU, 128GB RAM

That’s a lot of licencing costs, for such little capacity to actually run things.

2) Capacity

That’s only 128GB of RAM to run everything, and 96GB when taking into account N+1 maintenance. Even if it IS DEV/TEST or DR, you’ll still need to do maintenance. These particular servers COULD go to 64GB each, using 8GB DIMM’s, but they’re expensive and not really practical to consider.

3) Connectivity

Let’s assume part of why you were doing this, is to get rid of 1GbE in your racks. Maybe they’re old. Maybe they’re flaky. Maybe you just don’t want to support them. In either event, let’s assume you “need” 10GbE on them, if for no other reason but so that your Dev/Test *actually* looks and behaves like Production. No one wants to figure out how to do things in Dev with 12x1GbE and then try to reproduce it in Prod with 4x10GbE and assume it’s all the same. So you’ll need:

  • 8x 2pt 10GbE PCIe NICs @ $500 each = $4000
  • 8x TwinAx SFP+ cables @ $50 each = $400

We’ve now paid $4400 to upgrade our hosts to be able to use the same 10GbE infrastructure we were using for Prod. For servers that are worth maybe $250 on Kijiji or eBay (http://www.ebay.ca/itm/Dell-Poweredge-III-2950-Server-Dual-Quad-Core-2-83GHz-RAID-8-Cores-64Bit-VT-SAS-/130938124237?pt=COMP_EN_Servers&hash=item1e7c8537cd). Not the best investment.

4) Real Estate / Infrastructure

Re-using these existing hosts means 8U of space, probably 2x the power required, likely internal RAID and disks that is just burning up power and cooling.

A quick summary shows that we’ve now spent somewhere in the area of $60,000 to “save money” by reusing our old hardware. This will take up 8U of rack space, probably consume 1600W of power, and we’re investing hardware in very old equipment.

But what if we did similar to what we did with the primary cluster for Prod, and just bought… 2 more of the bigger new hosts.

2x Dev/Test Hosts @ 384GB:

  • 2x $11,500 hosts as above = $23,000
  • 4x Sockets of vSphere Enterprise Plus @ ~ $3500/each with SnS = $14,000
  • 2x Windows Server Data Center licences @ ~ $5000/each = $10,000
  • 4x Sockets of Veeam B&R licences @ ~ $1000/each = $4,000
  • 8x SFP+ TwinAx cables @ ~ $50/each = $400

Total Cost = $51,400

Total Resources = 57.6GHz CPU, 768GB RAM

Compared to:

Total Resources = 89GHz CPU, 128GB RAM

So we’ve now spent only $51,400 vs $60,000, and ended up with 6x the capacity on brand new, in warranty, modern hardware. The hardware is 100% identical to Prod. If we need or want to do any sort of testing in advance – vSphere patches, Firmware Upgrades, hardware configuration changes, we can now do so in Dev/Test and 100% validate that it will behave EXACTLY that way in Prod as it IS in fact exact. All of your training and product knowledge will also be the same, as you don’t have to consider variances in generations of hardware. We’re also going to use 2U and probably 600W of power vs 8U and 1600W.

If this is all in one site, and being used as Dev/Test you have a couple of ways you could set this up. We’re assuming this is all on the same SAN/storage, so we’re not creating 100% segregated environments. Also, the 10GbE switching will also be shared. So do you make a 3 node Prod and a 2 node Dev/Test cluster? Or do you make a 5 node cluster with a Prod and Dev/Test Resource Pool and use NIOC/SIOC to handle performance issues?

If this is for a second site, to potentially be used as DR, we’ve now saved $30K on the original solution and $8K on the solution we’re discussing now. This is $38K that you could spend on supporting infrastructure for your DR site – eg: 10GbE switching and SAN’s, which we haven’t accounted for at all. Granted $38K doesn’t buy a lot of that equipment – but it SURE is better than starting at $0. You just got handed a $40K coupon.

So, when you feel the urge to ask “but what should we do with this old hardware, can’t we do anything with it?” – the answer is “Yes, we can throw it away”. You’ll save money all day long. Give it to the keeners in your environment who want a home lab and let them learn and explore. If you really have no one interested… drop me a line. I ALWAYS have room in my lab or know someone looking. I’ll put it to use somewhere in the community .

Categories: Dell, Design, Hardware, VMware, vSphere

HOWTO: Fix Dell Lifecycle Controller Update issues

September 29, 2014 2 comments

Let’s say you’re in the middle of upgrading some Dell 11g hosts. They all have iDRAC 6 and Lifecycle Controllers, and you’ve downloaded the latest SUU DVD for this quarter. Then you want to update everything. You reboot the host, you press F10 to enter the LCC, you tell it to use the Virtual Media mounted SUU DVD ISO that it recognizes, it finds your updates, and you say go… only to get this:

clip_image001

Uh. So who authorizes them, because this is from a Dell SUU DVD, that’s about as good as I can get.

Turns out, I’m not the first person to have this problem, though it’s an older issue:

http://www.sysarchitects.com/solved-updates-you-are-trying-apply-are-not-dell-authorized-updates

http://en.community.dell.com/support-forums/servers/f/177/t/19475476

http://frednotes.wordpress.com/2012/11/21/the-updates-you-are-trying-to-apply-are-not-dell-authorized-updates/

It looks like the issue is that the LCC is at 1.4.0.586 currently – and needs to be 1.5.2 or better. 1.6.5.12 is current as of my SUU DVD, as you can see above. The other problem, is that Dell provides updates “in OS” for Linux and Windows – which doesn’t really help ESXi hosts at all. Seems the solution for this is a “OMSA Live CD” which I’ve never heard of until today. This can be found at: http://linux.dell.com/files/openmanage-contributions/om74-firmware-live/ and really good instructions on its use at: http://en.community.dell.com/techcenter/b/techcenter/archive/2014/03/20/centos-based-firmware-images-with-om-7-4-with-pxe

Now, the other alternative should have been to mount the SUU ISO as Virtual Media ISO and boot from it. But for whatever reason, this isn’t working and after selecting it, it just boots the HDD. I’m assuming, this is because the firmware on the iDRAC/LCC is too old and having some issues booting the ISO. That’s fine. I didn’t troubleshoot it too much after it failed 3 times in a row. I dislike hardware reboots that take 10 minutes, which is why I like VM’s, so I went looking for an alternative solution, and was happy with it.

When the system boots from the OM74 Live CD, it will auto-launch the update GUI:

clip_image002

Right now, you only need to do the Dell Lifecycle Controller. You could of course do more, but the point for me is to get the LCC working, then move back to doing the updates via that interface. So we’ll ONLY do the one update from here.

Click UPDATE FIRMWARE, and then:

clip_image003

Click UPDATE NOW in the upper left corner. You can see the STATUS DESCRIPTION showing it is being updated.

When the update is complete, you can then reboot the system and retry using the Unified System Configurator/Lifecycle Controller to complete the rest of your updates. (HOWTO- Using Dell iDRAC 7 Lifecycle Controller 2 to update Dell PowerEdge R420, R620, and R720 s would be a good place to look)

Design Exercise- Scaling up vs Scaling out

September 28, 2014 3 comments

Most people who know me, know that I have a thing for optimization and efficiency in the data center. One way I like to do so is by scaling up vs scaling out – and wanted to show an example of how this could work for you, if this is an option.

Recently, I had a client ask to replace some of their older servers and refresh their cluster. As they were still running (very tightly) on 4x Dell PowerEdge 2950’s with 32GB of RAM, their needs were clearly not super extensive. But they needed new hardware. For the sake of argument, let’s assume they also required licences – suppose their support agreements expired some time ago (by choice or omission, it doesn’t really matter). So we need new licences. The client knows they need newer/better hosts, and also “wants room for growth”. All well and good.

The request was to purchase 5x Dell PowerEdge R620 servers. Dual 6 core CPU (CPU usage sits about 30-40% for these users) and 128GB RAM (8x16GB) to keep the costs down. All systems would be diskless, booting via SD/USB, so no RAID controllers or local disks. Quad 10GbE NIC’s would be required, 2x for Data networks and 2x for Storage. Pretty basic stuff.

First, the general costs. Dell.ca still lets me build 12G servers, so let’s build one out with the above specs:

  • $7200 – 2x Intel E5-2630L 60w 6 Core 2.4GHz CPU, 8x16GB DDR3, 2xSD, 2x1GbE+2x10GbE Broadcom on board, 2x 10GbE Broadcom add-in card, redundant PSU

You may certainly choose different configurations, I simply chose one that gave me a good baseline here for an example.  Only the memory and potentially the CPU’s are changing throughout.

If we were to go ahead, as requested, we would need:

  • 5x $7200 hosts as above = $36,000
  • 10x Sockets of vSphere Enterprise Plus @ ~ $3500/each with SnS = $35,000
  • 5x Windows Server Data Center licences @ ~ $5000/each = $25,000
  • 10x Sockets of Veeam B&R licences @ ~ $1000/each = $10,000
  • 20x SFP+ TwinAx cables @ ~ $50/each = $1000

Total Cost = $107,000
Total Resources = 144GHz CPU, 640GB RAM

But what if you could do it with less hosts? They’d need to be more beefy for sure. But as this site only runs with 30-40% CPU load, we can increase the RAM and leave the CPU’s the same, and obtain better density. If we re-price the configuration with 16x16GB for 256GB total, we get a price of $9300. 24x16GB for 384GB total, we get a price of $11,500. The first reaction to this is usually something like “The hosts are 50% more, we can’t afford that”. Which usually fails to acknowledge that you no longer need as many. Let’s do the same math above, but with both new option:

256GB:

  • 4x $9300 hosts as above = $37,200
  • 8x Sockets of vSphere Enterprise Plus @ ~ $3500/each with SnS = $28,000
  • 4x Windows Server Data Center licences @ ~ $5000/each = $20,000
  • 8x Sockets of Veeam B&R licences @ ~ $1000/each = $8,000
  • 16x SFP+ TwinAx cables @ ~ $50/each = $800

Total Cost = $94,000
Total Resources = 115GHz CPU, 1024GB RAM

384GB:

  • 3x $11,500 hosts as above = $34,500
  • 6x Sockets of vSphere Enterprise Plus @ ~ $3500/each with SnS = $21,000
  • 3x Windows Server Data Center licences @ ~ $5000/each = $15,000
  • 6x Sockets of Veeam B&R licences @ ~ $1000/each = $6,000
  • 12x SFP+ TwinAx cables @ ~ $50/each = $600

Total Cost = $77,000
Total Resources = 86.4GHz CPU, 1152GB RAM

We’ve managed to potentially save $30,000, 2U of rack space, and a bunch of licencing in this scenario. There are some things to consider, however:

1) CPU requirements

What IF you couldn’t tolerate the CPU resource drop? If your environment is running at 80%+ CPU usage, then memory scaling isn’t going to help you. First thing to check though, would be if you have a bunch of VM’s that don’t have VMware tools installed and/or are constantly pinning the CPU because of some errant task. Fix that, you may find you don’t need the CPU you thought you did.

Upgrading to E5-2670 v2 2.5GHz 10 Core CPU’s brings the cost up to $13,500 per host or $2000 extra. But you go from 28.8GHz to 50GHz per host – or 150GHz for all 3 nodes. So you ‘only’ save $24,000 in this example then – still 22%

2) RAM gotchas

Check that populating all the memory slots doesn’t drop the memory speeds, as some configurations do. In this case, many environments I’ve seen would still ‘prefer’ to have MORE RAM than FASTER RAM. That may measure out to be untrue, but when internal customers are asking for double the memory per VM, they don’t care about how fast but how much. So you need to look at IF this will occur and IF you care.

3) N+1 sizing.

Remember you want to do maintenance. If you have 5 hosts and take out 1, you still have 4. If you have 3, you only have 2 left. Do you still have enough capacity. Let’s look at max and N+1 sizes:

5 Node = 144GHz CPU / 640GB or 115GHz / 512GB in maintenance

4 Node = 115GHz CPU / 1024GB or 86GHz / 768GB in maintenance

3 Node = 86GHz CPU / 1152GB or 57.6GHz / 768GB in maintenance.

So again, assuming RAM is your key resource, in either of the 4 or 3 node situations, you actually still exceed the 5 node cluster capacity at 100% health, while in maintenance.

4) Expandability

In my example, the 24 DIMM slot servers are maxed out from day one. The 5 node solution has 16 free slots, and could be upgraded as required. However, it has already paid $30,000 more, so that additional memory will only raise that delta. I’ve seen many environments where the attitude was “we’ll just add a host” – but fails to consider the licencing, ports, sockets, rack space, etc, required to “just add one more” – and it’s seldom just one more, when you start playing that game. I’ve seen environments where people had 10 hosts with 128GB or 192GB and wanted to add 5 more – rather than just replace the first 10 hosts with something with better density.

5) Lifetime maintenance costs

Now that you’ve managed to reduce the size of your cluster by 40%, that 40% is saved year after year. Consider all the things that might need to be done on a “per host” basis – patching, firmware updates, preventative maintenance, cabling, labelling, documentation, etc. Every time you do something “cluster wide”, you’ve likely reduced that workload by some amount.

This same example works just as well if your original cluster needed to be 10 or 15 nodes – and you chose to make it only 6 or 9. So this isn’t just a play for the “little guy” – and if environments supporting 1TB of RAM is “the little guy”….

Now, one thing I see, which is unfortunately part of the “corporate budget game” is the whole “well if we don’t use the budget, we’ll lose it” scenario. It drives me up the wall, but I get it, and it exists. So let’s assume you absolutely HAD to spend that original $107K, and burn through the $30K – and what could you do with it:

  • Spend some money on a DR “host” with DAS. It wouldn’t run everything, but it could be better than what you don’t have today
  • Maybe VSAN starts being a plausible option? One argument I tend to hear against it, is cost. At $7500/node, 3 is much less than 5 – and you could likely pay for your disks.
  • Host based caching like PernixData could be something you throw in, to drastically assist with storage IO performance
  • Maybe those 10GbE switches you’ve been wanting can finally be bought – heck, this basically makes them “buy the servers, the switches are free”
  • Training and consulting. Maybe you could now afford to send some guys on training. Or pay to get it installed if your team doesn’t have the skillsets in house

Something to keep in mind.

Categories: Dell, Design, Hardware, VMware

HOWTO: IBM RackSwitch G8124 – Stacking and Port Channels

September 26, 2014 Leave a comment

Welcome to a work in progress J I fully suspect I’ll end up having to circle around and update some of this as I actually get more opportunity to test. I’m still working on some infrastructure in the lab to let me test these switches to their fullest, but in the meantime I’m looking to try to figure out how to get them setup the way I would if I had them at a client site. In general, this means supporting stacking or vPC LACP Port Channels, and connectivity to Cisco Nexus 5548’s.

I managed to find a PDF that shows just such a configuration: http://www.fox-online.cz/ibm/systemxtraining/soubory/czech-2013-bp-final_slawomir-slowinski.pdf

The first figure covers a scenario with teamed NIC’s, with either a Windows host or vSphere ESXi with vDS and LACP:

clip_image001

The second option shows how one might do it with individual non-teamed NIC’s:

clip_image002

The importance of these slides is that the confirm:

  • Cisco Nexus vPC connectivity if certainly a valid use case.
  • The IBM/BNT/Blade terminology for vPC is vLAG – I can live with that

What isn’t shown on THESE slides is some model information:

  • IBM G8000 48x 1GbE switches DO support stacking
  • IBM G8052 52x 1GbE switches do NOT support stacking, but support vLAG
  • IBM G8124 24x 10GbE switches do NOT support stacking, but support vLAG
  • IBM Virtual Fabric 10GbE BladeChassis switches DO support stacking

So there goes my hope for stacking. Not really the end of the world, if it supports vPC(vLAG). So with that in mind, we’ll move on.

I did manage to find a fellow who’s documented the VLAG and VRRP configuration on similar switches: http://pureflexbr.blogspot.ca/2013/10/switch-en4093-vlag-and-vrrp-config.html

So with some piecing together, I get, for Switch 2 (Switch 1 was already configured):

# Configure the LACP Trunk/Port-Channel to be used for the ISL, using ports 23 and 24

interface port 23-24

tagging

lacp mode active

# Set the LACP key to 200

lacp key 200

pvid 4094

exit

!

# Configure VLAN 4094 for the ISL VLAN and move the ports into it.

vlan 4094

enable

name "VLAN 4094"

member 23-24

!

# Set a new STPG of 20 with STP disabled

no spanning-tree stp 20 enable

# Add ports 23 and 24 to said STPG

interface port 23-24

no spanning-tree stp 20 enable

exit

# Create the VLAN and IP Interface

interface ip 100

# Remember that this is on Switch2, so it is using IP2

# Change this when configuring Switch1

ip address 10.0.100.252 255.255.255.0

# configure this subnet configuraiton for VLAN4094

vlan 4094

enable

exit

!

# Configure the vLAG

vlag tier-id 10

# Indicate that the ISL VLAN is 4094

vlag isl vlan 4094

# As we’re on Switch2, this IP will be for Switch1 as the Peer

vlag hlthchk peer-ip 10.0.100.251

# Specify that same LACP ISL key of 200

vlag isl adminkey 200

# Enable the VLAG

vlag enable

!

If all goes well, you’ll see:

clip_image003

Sep 25 22:58:02 NW-IBMG8124B ALERT vlag: vLAG Health check is Up

Sep 25 22:58:11 NW-IBMG8124B ALERT vlag: vLAG ISL is up

Now, the questions I have for this:

· How do I create an actual vLAG – say using Ports 20 on both switches?

· What traffic is passing on this vLAG ISL? Is this just a peer-configuration check, or is it actually passing data? I’m going to assume it’s functioning as a TRUNK ALL port, but I should probably sift through the docs

· When will I have something configured that can use this J

Expect me to figure out how to configure the first in the next few days. It can’t be that much harder. In the meantime, I’m also building up a HDD+SSD StarWind SAN in a host with 2x 10GbE SFP+ that should let me configure port channels all day long. For now, I don’t really need them, so it might be a bit before I come back to this. Realistically, for now, I just need ISCSI, which doesn’t really want any LACP, just each switch/path to be in its own subnet/VLAN/fabric, with individual target/initiator NIC’s, unteamed. So as soon as I get a device up that can handle 10GbE traffic, I’ll be testing that!

HOWTO: IBM RackSwitch G8124 – Initial Configuration

September 23, 2014 8 comments

With the acquiring of my new G8124F 10GbE switches (https://vnetwise.wordpress.com/2014/09/20/ibm-rackswitch10gbe-comes-to-the-lab/) , we need to look at the basic configuration. This is going to include general switch management that will be generic to any switches, such as:

  • Setting hostname and management IP on the OoB interface
  • DNS, SysLog, NTP
  • Management users
  • Confirming we can back up the config files to a TFTP server
  • RADIUS – I expect to need a HOWTO of its own, largely because I’m going to have to figure out what the RADIUS Server side requires

Information we’ll need:

Top Switch:

  • Hostname: NW-IBMG8124A
  • IP: 10.0.0.94
  • MGMT_A: NW-PC6248_1/g39 – VLAN 1 – Access
  • p24 -> NW-IBMG8124B/p24
  • p23 -> NW-IBMG8124B/p23
  • p01 -> NW-ESXI04 vmnic5

Bottom Switch:

  • Hostname: NW-IBMG8124B
  • IP: 10.0.0.95
  • MGMT_A: NW-PC6248_1/g39 – VLAN 1 – Access
  • p24 -> NW-IBMG8124A/p24
  • p23 -> NW-IBMG8124A/p23

Common Information:

  • Subnet: 255.255.255.0
  • Gateway: 10.0.0.1
  • DNS1: 10.0.0.11
  • DNS2: 10.0.0.12
  • NTP: 10.0.0.11
  • SysLog: 10.0.0.10

Manual Links:

What you can tell from above, is that ports 23/24 are linked together with a pair of Cisco passive DAC SFP+ TwinAx cables. Port 1 on the top switch is connected to an unused 10GbE port on an ESXi host so we can do some basic testing. Both switches have their MGTA ports connected to my current Dell PowerConnect 6248 switches, on ports {Top/Bottom}/g39 respectively, with no VLAN trunking. This won’t really matter for the basic configuration we’re doing now, but it will once we start configuring data ports vs simply management interfaces.

1) Initial Login:

I was going to use my Digi CM32 and an RJ45 cable and converter to connect to the DB9, however, both the cable and my converters are both female and I have no serial gender benders on hand. So instead, I opted to use two serial ports on two ESXi hosts, and connect the COM port to a VM. Note, you will have to power down the VM to do so, and it will prevent vMotion, etc. I’m using disposable VM’s I use for benchmarking and testing, so this isn’t a concern. Port speeds are whatever the default PuTTY assumes – 9600,8,N,1, I’m sure.

clip_image001

First, the hard part. The default password is “admin” with no password.

2) Enter configuration:

clip_image002

The first thing you’ll notice, is that so far, this feels very Cisco like. To get started, we enter the “enable” mode and then “conf t” to configure from the terminal.

Command:

enable

configure terminal

3) Let’s confirm our running configuration:

clip_image003

Yup. That’s pretty reset to factory.

Command:

show running-config

4) As per the manual, we’ll set up the management IP’s on both switches:

clip_image004

Page 44 suggests the following commands:

interface ip-mgmt address 10.0.0.94

interface ip-mgmt netmask 255.255.255.0

interface ip-mgmt enable

interface ip-mgmt gateway 10.0.0.1

interface ip-mgmt gateway enable

However, as you can see above, it appears that the version of the firmware I’m running has two options for “interface ip-mgmt gateway” – address w.x.y.z and enable. So the actual commands are:

Commands:

interface ip-mgmt address 10.0.0.94

interface ip-mgmt netmask 255.255.255.0

interface ip-mgmt enable

interface ip-mgmt gateway address 10.0.0.1

interface ip-mgmt gateway enable

clip_image005

You can expect to see a message like the above when the link comes up. In my case, this was because I didn’t configure the Dell PC6248’s until after doing this step.

5) Set the hostname:

clip_image006

Command:

hostname NW-IBMG8124B

We can set the hostname. Note that it changes immediately.

6) Now would be a good time to save our work:

clip_image007

Just like on a Cisco, we can use:

wr mem

or

copy running-config startup-config

Note the prompt above – because the switch is restored to factory defaults, it is booting in a special mode that bypasses any existing configurations. This is why it confirming if you want your next boot to go to the current running/startup config.

7) Set NTP server(s):

clip_image008

You will need to configure at least the “primary-server” if not also the “secondary-server” with an IP address as well as the PORT on the switch that will do the communication. In my case, I’ll be letting the mgta-port connect out, but this could easily be a data port on the switch as well. Do note that it requires an IP address, so you won’t be able to use DNS names such as “ntp1.netwise.ca”, unfortunately. Then, enable the NTP functionality.

Command:

ntp primary-server 10.0.0.11 mgta-port

ntp enable

You’ll note I made a typo, and used the wrong IP. That actually worked out well for the documentation:

clip_image009

When I changed the IP, you can see console immediately displays that it has updated the time.

This is also a good time (pun intended) to set up your timezone. You can use the “system timezone” command to be prompted via menus to select your numbered timezone. As I had no clue what my number might be for Alberta (DST-7?), I ran through the wizard – then checked the running config:

clip_image010

There we go. Command to set America/Canada/Mountain-Alberta as your timezeone:

system timezone 93

8) Setup an admin user:

clip_image011

User access is a little different from a Cisco switch. Here we need to set the name, enter a password, give the user a level, and then enable the user. Note that you cannot enter the password at the command line – it will interactively prompt you. So there’s no point entering any password in the config

Commands:

access user 10 name nwadmin

access user 10 password

access user 10 level administrator

access user 10 enable

The running-config shows the password command as:

access user 10 password "f2cbfe00a240aa00b396b7e361f009f2402cfac143ff32cb09efa7212f92cef2"

Which suggests you must be able to provide the password at the command line, non-interactively.

It is worth noting the built in “administrator” account has some specialty to it. To change this password you would use:

Access user administrator-password <password>

Setting the password to blank (null) will disable the account. Similar also exists for “operator-password” for the “oper” account, but it is disabled by default.

9) Setup SSH:

At this point, the switches are on the network, but I’m still configuring them via serial console. If we attempt to connect to them, we’ll realize that SSH doesn’t work but Telnet does – which is generally expected.

clip_image012

Commands:

ssh port 22

ssh enable

You should now be able to connect as the user you just created, AS WELL AS the default user – admin with a password of admin.

10) Disable Telnet

Now that we’ve configured SSH, let’s get rid of telnet. There is no equivalent “telnet disable”, but you can use “no …” commands.

clip_image013

Commands:

no access telnet enable

Note that my active Telnet configurations has their configurations closed, and indicated on the console.

11) Set SNMP:

My SNMP needs are basic – I largely use it for testing monitoring and management products. So we’ll just set a basic Read Only and Read Write community, and we’ll set it for SNMP v2 which is the most common:

clip_image014

Commands:

snmp location "NetWise Lab"

snmp name NW-IBMG8124B

snmp read-community "nw-ro"

snmp write-community "nw-rw"

snmp version v1v2v3

access snmp read-only

access snmp read-write

NOTE: The SNMP name will change the HOSTNAME, and should not include quotes. This makes me believe it would ASSUME the hostname, which is what most people set to anyway.

12) Configure HTTPS access:

Some people like HTTPS configuration access, some see it as a security risk. I’ll enable it so I have the option of seeing what it looks like

clip_image015

Commands:

access https enable

If there is no self signed certificate, it will generate one.

13) Configure DNS

It would be nice if we could get DNS for hostname resolution. Nothing is worse than having to remember IP’s.

clip_image016

Commands:

ip dns primary-server 10.0.0.11 mgta-port

ip dns secondary-server 10.0.0.12 mgta-port

ip dns domain-name netwise.ca

14) Configure Spanning Tree

Any good switch should do some manner of Spanning Tree. As these will be my storage switches, we’ll ensure these are set to protect against loops and also set as Rapid Spanning Tree (RSTP)

clip_image017

Command:

spanning-tree loopguard

spanning-tree mode rstp

15) Configure SysLog:

clip_image018

This is pretty simple, we simply point it at the IP and tell it to use the mgta-port.

Command:

logging host 1 address 10.0.0.10 mgta-port

logging host 1 severity 7

logging log all

What is nice is you can define two of them, by specifying “host 2”

16) Backup the running config:

clip_image019

Configuring the switch isn’t a lot of good if you don’t back up the configuration. So we’ll make a copy of the config to our TFTP server.

Command:

copy running-config tftp address 10.0.0.48 filename NW-IBMG8124B_orig.cfg mgta-port

It is worth noting that it does support standard FTP as well, if you desire.

So if we take all of the above and put the commands together, we get:

enable

conf t

interface ip-mgmt address 10.0.0.94

interface ip-mgmt netmask 255.255.255.0

interface ip-mgmt enable

interface ip-mgmt gateway address 10.0.0.1

interface ip-mgmt gateway enable

hostname NW-IBMG8124A

copy running-config startup-config

ntp primary-server 10.0.0.11 mgta-port

ntp enable

access user 10 name nwadmin

access user 10 password "f2cbfe00a240aa00b396b7e361f009f2402cfac143ff32cb09efa7212f92cef2"

access user 10 level administrator

access user 10 enable

#access user administrator-password <ChangeMe>

ssh port 22

ssh enable

no access telnet enable

snmp location "NetWise Lab"

snmp name NW-IBMG8124A

snmp read-community "nw-ro"

snmp write-community "nw-rw"

snmp version v1v2v3

access snmp read-only

access snmp read-write

access https enable

ip dns primary-server 10.0.0.11 mgta-port

ip dns secondary-server 10.0.0.12 mgta-port

ip dns domain-name netwise.ca

spanning-tree loopguard

spanning-tree mode rstp

logging host 1 address 10.0.0.10 mgta-port

logging host 1 severity 7

logging log all

We now have a basically working switch, from a management perspective.  Next will be to get it passing some actual data!

 

Some other interesting command:

While poking around in the (conf t) “list” command, which will show you all the command options, I found some interesting ones:

boot cli-mode ibmnos-cli

boot cli-mode iscli

boot cli-mode prompt

The ISCLI is the “Is Cisco Like” which is why it seems familiar. The other option is IBMNOS-CLI, which is… probably painful

 

boot configuration-block active

boot configuration-block backup

boot configuration-block factory

Here is how we can tell the switch to reset itself or boot clean. It’s not immediately clear to me how this would be better than “erase startup-config”, “reload”, but it’s there.

 

boot schedule friday hh:mm

boot schedule monday hh:mm

boot schedule saturday hh:mm

boot schedule sunday hh:mm

boot schedule thursday hh:mm

boot schedule tuesday hh:mm

boot schedule wednesday hh:mm

I can’t think of a lot of times I’ve wanted to schedule the reboot of switches on a weekly basis. Or reasons why I’d need to, on a good switch. But… maybe it’s to know that it WILL reboot when the time comes? If you reboot it weekly, then you might not be so timid to do so after the uptime is 300+ days and no one remembers if this is the switch that has startup issues?

 

interface ip-mgta address A.B.C.D A.B.C.D A.B.C.D enable

Not sure why I’d want multiple IP’s on the management interface – but you can.

interface ip-mgta dhcp

In case you want to set your management IP’s to DHCP. Which sounds like a fun way to have a bad day someday…

 

ldap-server backdoor

Not sure what on earth this does

 

ldap-server domain WORD

ldap-server enable

ldap-server primary-host A.B.C.D mgta-port

ldap-server secondary-host A.B.C.D mgta-port

Need to look into what LDAP supports

 

logging console severify <0-7>

logging console

Sets up how much is logged to the console

 

logging host 1 address A.B.C.D mgta-port

Configures syslog via the mgta-port

 

logging log all

Logs everything, but you can do very granular enablement.

 

radius-server backdoor

Not sure what on earth this does

radius-server domain WORD

radius-server enable

radius-server primary-host A.B.C.D mgta-port

radius-server secondary-host A.B.C.D mgta-port

I’ll need to find the appropriate commands for both the switches as well as the RADIUS server to enable groups.

 

virt vmware dpg update WORD WORD <1-4094>

virt vmware dpg vmac WORD WORD

virt vmware dvswitch add WORD WORD WORD

virt vmware dvswitch add WORD WORD

virt vmware dvswitch addhost WORD WORD

virt vmware dvswitch adduplnk WORD WORD WORD

virt vmware dvswitch del WORD WORD

virt vmware dvswitch remhost WORD WORD

virt vmware dvswitch remuplnk WORD WORD WORD

virt vmware export WORD WORD WORD

I understood the switch was virtualization aware – but this is going to need some deeper investigation!

Categories: Hardware, Home Lab, IBM, RackSwitch