Home > Dell, Design, Hardware, VMware > Design Exercise- Scaling up vs Scaling out

Design Exercise- Scaling up vs Scaling out

Most people who know me, know that I have a thing for optimization and efficiency in the data center. One way I like to do so is by scaling up vs scaling out – and wanted to show an example of how this could work for you, if this is an option.

Recently, I had a client ask to replace some of their older servers and refresh their cluster. As they were still running (very tightly) on 4x Dell PowerEdge 2950’s with 32GB of RAM, their needs were clearly not super extensive. But they needed new hardware. For the sake of argument, let’s assume they also required licences – suppose their support agreements expired some time ago (by choice or omission, it doesn’t really matter). So we need new licences. The client knows they need newer/better hosts, and also “wants room for growth”. All well and good.

The request was to purchase 5x Dell PowerEdge R620 servers. Dual 6 core CPU (CPU usage sits about 30-40% for these users) and 128GB RAM (8x16GB) to keep the costs down. All systems would be diskless, booting via SD/USB, so no RAID controllers or local disks. Quad 10GbE NIC’s would be required, 2x for Data networks and 2x for Storage. Pretty basic stuff.

First, the general costs. Dell.ca still lets me build 12G servers, so let’s build one out with the above specs:

  • $7200 – 2x Intel E5-2630L 60w 6 Core 2.4GHz CPU, 8x16GB DDR3, 2xSD, 2x1GbE+2x10GbE Broadcom on board, 2x 10GbE Broadcom add-in card, redundant PSU

You may certainly choose different configurations, I simply chose one that gave me a good baseline here for an example.  Only the memory and potentially the CPU’s are changing throughout.

If we were to go ahead, as requested, we would need:

  • 5x $7200 hosts as above = $36,000
  • 10x Sockets of vSphere Enterprise Plus @ ~ $3500/each with SnS = $35,000
  • 5x Windows Server Data Center licences @ ~ $5000/each = $25,000
  • 10x Sockets of Veeam B&R licences @ ~ $1000/each = $10,000
  • 20x SFP+ TwinAx cables @ ~ $50/each = $1000

Total Cost = $107,000
Total Resources = 144GHz CPU, 640GB RAM

But what if you could do it with less hosts? They’d need to be more beefy for sure. But as this site only runs with 30-40% CPU load, we can increase the RAM and leave the CPU’s the same, and obtain better density. If we re-price the configuration with 16x16GB for 256GB total, we get a price of $9300. 24x16GB for 384GB total, we get a price of $11,500. The first reaction to this is usually something like “The hosts are 50% more, we can’t afford that”. Which usually fails to acknowledge that you no longer need as many. Let’s do the same math above, but with both new option:


  • 4x $9300 hosts as above = $37,200
  • 8x Sockets of vSphere Enterprise Plus @ ~ $3500/each with SnS = $28,000
  • 4x Windows Server Data Center licences @ ~ $5000/each = $20,000
  • 8x Sockets of Veeam B&R licences @ ~ $1000/each = $8,000
  • 16x SFP+ TwinAx cables @ ~ $50/each = $800

Total Cost = $94,000
Total Resources = 115GHz CPU, 1024GB RAM


  • 3x $11,500 hosts as above = $34,500
  • 6x Sockets of vSphere Enterprise Plus @ ~ $3500/each with SnS = $21,000
  • 3x Windows Server Data Center licences @ ~ $5000/each = $15,000
  • 6x Sockets of Veeam B&R licences @ ~ $1000/each = $6,000
  • 12x SFP+ TwinAx cables @ ~ $50/each = $600

Total Cost = $77,000
Total Resources = 86.4GHz CPU, 1152GB RAM

We’ve managed to potentially save $30,000, 2U of rack space, and a bunch of licencing in this scenario. There are some things to consider, however:

1) CPU requirements

What IF you couldn’t tolerate the CPU resource drop? If your environment is running at 80%+ CPU usage, then memory scaling isn’t going to help you. First thing to check though, would be if you have a bunch of VM’s that don’t have VMware tools installed and/or are constantly pinning the CPU because of some errant task. Fix that, you may find you don’t need the CPU you thought you did.

Upgrading to E5-2670 v2 2.5GHz 10 Core CPU’s brings the cost up to $13,500 per host or $2000 extra. But you go from 28.8GHz to 50GHz per host – or 150GHz for all 3 nodes. So you ‘only’ save $24,000 in this example then – still 22%

2) RAM gotchas

Check that populating all the memory slots doesn’t drop the memory speeds, as some configurations do. In this case, many environments I’ve seen would still ‘prefer’ to have MORE RAM than FASTER RAM. That may measure out to be untrue, but when internal customers are asking for double the memory per VM, they don’t care about how fast but how much. So you need to look at IF this will occur and IF you care.

3) N+1 sizing.

Remember you want to do maintenance. If you have 5 hosts and take out 1, you still have 4. If you have 3, you only have 2 left. Do you still have enough capacity. Let’s look at max and N+1 sizes:

5 Node = 144GHz CPU / 640GB or 115GHz / 512GB in maintenance

4 Node = 115GHz CPU / 1024GB or 86GHz / 768GB in maintenance

3 Node = 86GHz CPU / 1152GB or 57.6GHz / 768GB in maintenance.

So again, assuming RAM is your key resource, in either of the 4 or 3 node situations, you actually still exceed the 5 node cluster capacity at 100% health, while in maintenance.

4) Expandability

In my example, the 24 DIMM slot servers are maxed out from day one. The 5 node solution has 16 free slots, and could be upgraded as required. However, it has already paid $30,000 more, so that additional memory will only raise that delta. I’ve seen many environments where the attitude was “we’ll just add a host” – but fails to consider the licencing, ports, sockets, rack space, etc, required to “just add one more” – and it’s seldom just one more, when you start playing that game. I’ve seen environments where people had 10 hosts with 128GB or 192GB and wanted to add 5 more – rather than just replace the first 10 hosts with something with better density.

5) Lifetime maintenance costs

Now that you’ve managed to reduce the size of your cluster by 40%, that 40% is saved year after year. Consider all the things that might need to be done on a “per host” basis – patching, firmware updates, preventative maintenance, cabling, labelling, documentation, etc. Every time you do something “cluster wide”, you’ve likely reduced that workload by some amount.

This same example works just as well if your original cluster needed to be 10 or 15 nodes – and you chose to make it only 6 or 9. So this isn’t just a play for the “little guy” – and if environments supporting 1TB of RAM is “the little guy”….

Now, one thing I see, which is unfortunately part of the “corporate budget game” is the whole “well if we don’t use the budget, we’ll lose it” scenario. It drives me up the wall, but I get it, and it exists. So let’s assume you absolutely HAD to spend that original $107K, and burn through the $30K – and what could you do with it:

  • Spend some money on a DR “host” with DAS. It wouldn’t run everything, but it could be better than what you don’t have today
  • Maybe VSAN starts being a plausible option? One argument I tend to hear against it, is cost. At $7500/node, 3 is much less than 5 – and you could likely pay for your disks.
  • Host based caching like PernixData could be something you throw in, to drastically assist with storage IO performance
  • Maybe those 10GbE switches you’ve been wanting can finally be bought – heck, this basically makes them “buy the servers, the switches are free”
  • Training and consulting. Maybe you could now afford to send some guys on training. Or pay to get it installed if your team doesn’t have the skillsets in house

Something to keep in mind.

Categories: Dell, Design, Hardware, VMware

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: