Expert Discusses High Performance Data Centres in Depth

Author: Prof. Ian Bitterlin, Associate at Portman Partners

Ian Bitterlin, of Critical Facilities Consulting – a DCA Corporate Partner – provides his view on ‘high performance’ data centres. He examines ICT load in Data Centres and discusses the performance of servers, along with their power consumption.

The idea that we can talk about a ‘high performance’ data centre in relation to others raises many more questions than suitable answers, so where to start?

The most logical place is with the ICT load – or what the data centre is intended ‘for’ and this can be distilled down to the three basic definitions of a data centre load: compute; storage; and connectivity. For example, a site for social networking based on users’ photo uploads and minor text comments with a few buttons to click such as ‘like’, ‘report as abusive’ or ‘share’ which will need yards of bandwidth, acres of storage but very little computation. Or an onsite data centre for a particle accelerator that creates a petabyte of data a minute and then expends vast computation capacity producing massive data sets but hardly speaks to the outside world. And every combination of the three loads in between. Each in their own way could be classed as ‘high performance’ and perhaps we could add ‘security’ as another performance attribute?

However, to date, most data centres are not built or fitted out for specific applications and they purchase commercially available integrated servers to run multi-app enterprises, multi-user colocation or ‘clouds’ with flexible configurations set by the user. Some parts of each could be classed as high-performance but how could we rank them into low/medium/high?

The problem is that it is a constantly movable feast with the release date of the server setting the bar; the more modern, faster, with the ability to crunch numbers in ever-more operations per second and at ever-less Watts per operation with a minimal increase in real cost compared to the technology capacity curve. There is also a trend for ever-lower idle power if the user does not utilise their hardware as they should. This server improvement trajectory is mostly ignored by people trying to criticise the rising energy demand of data centres – imagine what the energy growth would be like if the ICT hardware were not also on an exponential improvement curve: in round terms, data traffic has been growing at 60% CAGR for the past 15 years and the hardware capacity at 50% CAGR, such that data centre loads have grown at only 10% CAGR. There is now plenty of evidence that data traffic has flattened out in mature markets, including the UK, and data centre energy is stabilising for the moment – but that’s another story.

So how far and quickly have servers come?

A good source of server performance data is the SPEC website where several hundred servers are listed by the OEMs, along with their performance against the SPEC Power software loading routines. The SPEC Power test regimes do not represent all loads (or any specific) but as a benchmark are very informative, although, as usual, ‘there are other test routines available’ etc. For example, we can use SPEC to compare two servers that are both in service today but about six years apart in market release date. I will call them Server-A and Server-B, although you can make your own comparisons by looking at the SPEC listings for free.

Server-A had a rated power consumption at 100% SPEC load of 394W and performed 26,880 operations per second – which resulted in 45 Operations/Watt. Its idle power (when doing no work at all) was 226W which is a surprising 57%, although that compared to some other servers of the time idled at a rather depressing 79%.

In comparison, Server-B has a rated power consumption at 100% SPEC load of 329W (17% lower) and performs 4,009,213 operations per second (88,000x more) – which results in 12,212 Operations/Watt (270x more). Roughly five years ago, a whitepaper was written which predicted that the lower limit for idle power in silicon chips would be 20%, but Server-B managed 13% (44W) in 2017.

So, Server-B (not hugely different in purchase cost to Server-A) can, on power consumption alone, replace at least 270 modules of Server-A, a remarkable consolidation of 40 ICT cabinets into one. However, when Server-A was released, it was a very popular machine and offered huge performance compared to what went before it and at a lower cost. I remember seeing a HPC installation in the UK handling datasets for oil and gas exploration surveys using Server-A but I would assume that it is now, no doubt, upgraded.

So, what ‘was’ high-performance is now painfully slow compared to a modern machine and a data centre using the latest commercial hardware that is capable of very high performance indeed.

That is not to say that every server suits every application for the type of load, quite the opposite, nor is it true to assume that software is being improved in resource efficiency, again quite the opposite is the reality. This subject, of matching the server to the application and ensuring high utilisation through right sizing and virtualisation, is a key feature of the open-source designs of Facebook but it has to be noted that having such a single and simple application helps them achieve very high performance that few others are able to emulate. But that must be the target: refreshing the hardware every sub-three years and heavily virtualising must become the norm in enterprise and collocation facilities if we are to meet our zero-carbon targets.

This blog was written by Prof. Ian Bitterlin, a Chartered Engineer with more than 27 years’ experience in data centre power and cooling following 25 years’ in rotating electrical machines and systems. You can find the original blog, here.