Author: Ian Bitterlin, Portman Partners Associate
In Part III of this series, we considered the first step towards sustainability: reducing load. The conclusion we could draw from that was the demand for ever-more ICT services enveloped wealthier parts of the world will not happen without a rapid acceleration in climate change and a substantial shift in public opinion and behaviour. So, for sustainable data centres, we now need to concentrate on the data centre itself, and the second step towards sustainability is to improve the process.
In the first part of this series, I commented on data centres’ PUE and its stagnation over the past few years. However, this information is still valuable for keeping an eye on the energy expended and the infrastructure that keeps the ICT services up and running, often continuously. In my 33 years of experience in and around data centres, I have yet to find a developer or client who places low PUE as the top priority in design. First, as always, comes reliability/availability followed by affordability, scalability, and later, energy effectiveness. Clearly, no one wants to waste energy, but balancing low energy design with risk, real or perceived, to the ICT load or service, leads to a higher PUE. Good design can minimise the increase, but discussions with many contacts in the industry show that this view is widely held but is rarely expressed in public.
We can compare ourselves to low-energy energy housing. Once you have changed the energy behaviour of the occupants, for example, dressing for the weather, it then involves high insulation, energy recovery in ventilation, and a construction designed to minimise energy consumption in the location's climate: less glass, more sheep's wool.
In data centres this is not only PUE but, more importantly, choosing the right/best ICT hardware for your applications, virtualising it heavily, and making sure that the utilisation is as high as it can be. If your application involves low utilisation of the ICT hardware, then you should make sure that the ICT hardware has a low idle power. Not all servers are created equal and, while not so long-ago servers idled at over 75% of full-load power, the best-in-class of 2021 idles at only 13%. This is one of the (good) reasons you may consider putting your load onto a cloud platform – the utilisation is guaranteed to be high – but maybe data security and service reliability come into the choice considerations.
However, before choosing the best hardware for your application(s), you should remove all redundant hardware. Respondents to a well-known Uptime Institute study said that 20% of all servers were 'comatose' (some call them 'zombies') – doing nothing except being backed up and virus checked. Compare the investment required for reducing the load by 20% or busting-a-gut to improve your PUE from 1.6 to 1.5, not to mention the risks. Then, check that your ICT inventory is all less than 2-3 years old before anything else. Moore's Law and its derivatives are all in good health as far as the technology development curve in server capacity and speed is concerned. Regularly refreshing your hardware will cut consumption per unit of compute, storage, and comms. In fact, there is much evidence to show that businesses not involved with video, including finance/banking applications, are witnessing their data centres go down, despite their demand for ICT applications continuing to increase.
To lower the PUE, we know exactly what to do, and if we don't, we can download the free EU CoC on data centre efficiency, which includes c300 ways of getting PUE down. But beware, getting the PUE down in most facilities is no more complicated in increasing the load, or even not removing comatose servers or virtualising – because PUE increases under partial load, albeit perversely.
One central design point in data centres is the level of resilience (the Tier or Availability Class of EN 50600), and this will impact the lower limit of the PUE. The higher availability you design for, the higher the potential PUE will be because you will have redundant M&E plants running with increased standing losses. Good design can minimise the effect but not eliminate it.
Be aware that small items in the CoC can be challenged, such as the suggestion that white-painted ICT cabinets save energy by reducing lighting power. And some things can be added, such as recommendations for removing the front and rear doors because they impede airflow and increase fan load in both the cooling system and the ICT load itself. My apologies if this is one of the 300, but last time I checked, it was not. If you have an idea, then let the CoC admin know (Mark Acton, amongst others), and they will include it in their annual review – a feature that is unique to the CoC and serves to keep it best practice rather than 'so last year.'
But, assuming we 'behave,' we should start with Thermal Management; cold-aisle/hot-aisle, containment, and blanking plates to eliminate bypass air and maximise the waste air temperature that will enable us to maximise free-cooling by reducing mechanical refrigeration. Then, although not popular, enable eco-mode UPS, and make all plants modular and scalable to maximise the load on the plant that is running, automatically shutting down capacity that is not needed.
Much of Thermal Management is based on pushing the thermal envelope of the ICT hardware. Yet, much of the industry is still cautious about taking the well-documented (in the ASHRAE Thermal Guidelines' Allowable' Range) risks associated with higher temperature and relaxed humidity. No one outside of a data centre business can criticise that reluctance. Risks can be real or perceived, but all end up being real, and offering clients increased risk in return for saving money has proven to be a hard sell. The fact remains that 1kWh expended in a (UK) data centre enables/protects an average of £150 of business value – the cost of the energy being 0.1% of the business value, hence saving energy is only attractive if nothing is changed or risked in the ICT service. It should not be a surprise to see that improving the process efficiency starts with the ICT load rather than the PUE and that, once you have got your system aligned with your business, your PUE is hard to reduce further – hence the stagnation.
The third step in our search for sustainability, which we will review in Part IV of this series, is, finally, to power the system with renewable energy, expending a valuable resource on a reduced and optimised load. The equivalent in a low-energy home would be converting it to all-electric with ground-source heat pumps, heat recovery into the ventilation, LED lighting, etc., all fed from a 100% renewable grid.