Until recently, many organisations didn't have to think about their data centre infrastructure more than once a decade. As long as there was enough space to house the new server rack, cooling and power needs would work themselves out. But those times are quickly passing as the demand for computing power increases and puts a strain on electricity supplies. According to market research firm IDC, computer support infrastructure needed to house and run servers is second only to system price among the concerns of data centre managers. Steve Conway, a research vice president for high-performance computing at IDC, says, "These issues were at the number-12 position just three to four years ago, which means they were a non-issue."
This change in priority reflects shifts in technology and a sharp growth in demand for processing power. Virtualisation and multicore processors are allowing us to put dramatically more power in a smaller footprint. And the increasing degree to which businesses of all types rely upon connected computing for core business processes has driven enterprises to push ever more racks of computers into their existing data centres. Meanwhile, Gartner predicts that by next year, half the world's data centres will not have the infrastructure needed to meet the power and cooling requirements of the latest high-density equipment.
These changes bring to managers of mainstream data centres issues that managers like myself in high-end scientific and technical supercomputing centres have been dealing with for decades: How to properly site infrastructure support equipment, optimize cooling for high server rack densities, balance data centre efficiency against business needs and track all the little details that can make or break an implementation.
The data centre in which I work, a Department of Defense supercomputing centre located at the Army Engineer Research and Development Center (ERDC), is in the middle of a two-year effort to totally overhaul its data centre support infrastructure. Designing a new data centre or retrofitting an old one is a complex process, but the six ideas below-road tested by our experiences during the past decade and informed by ERDC's ongoing infrastructure modernization-will get you started in the right direction.
1. Decide whether you really need your own data centre. Growing your computing infrastructure is a challenging, expensive process. Before you commit to your next upgrade, ask yourself, "Do I even need my own data centre?"
A minimally robust infrastructure is going to include power-switching equipment and generators. But almost no one stops there. Added fault tolerance includes batteries or flywheels for the uninterruptible power supply (UPS), reserve water supplies in case your utility water is interrupted, redundant components, and possibly even multiple independent commercial power connections. Then you have to protect yourself from fire and natural disasters. And once the data centre is built, you're going to have to field a crew to monitor and maintain it.
As Amazon CTO Werner Vogels said at the recent Next Generation Data Center conference, unless you're in an industry where having a highly efficient, in-house data centre translates directly into revenue, you might be better off running your applications in someone else's data centre.
This solution isn't right for everyone, but as utility costs rise and growing demand squeezes the support infrastructure even tighter, it's worth considering.
2. Weigh the costs and benefits of green design.Rising costs and consumption rates are driving electricity concerns to the front of IT planning conversations. Items like transformers, electrical wiring, cooling and UPS can have large, fixed electrical losses, taking a slice off your available power before it gets to the first server.
The Green Grid, a consortium of information technology companies interested in improving data centre energy efficiency, recommends right-sizing your infrastructure by eliminating redundant components, installing only the equipment you need to make your data centre run today. According to the group's Guidelines for Energy-Efficient Data Centers, right-sizing the infrastructure can save as much as 50 percent off the electric bill.
But there is another wrinkle to the energy story that's only just started to work its way into data centre planning: Our national utility infrastructure is starting to show signs of wear.
The bridge collapse in Minneapolis and massive power outages in the early years of this decade are symptoms of a rapidly declining national critical infrastructure. Events like the outage on Aug. 14, 2003, which left 50 million people around the Great Lakes without power, are expected to become more common over the next several years unless significant steps are taken to rein in demand and increase the capacity and reliability of our aging power grid.
According to the most recent report of long-term power utility reliability by the North American Electric Reliability Council, demand for electricity is expected to grow 19 percent in the next 10 years, but generation capacity is expected to grow only by 6 percent. This means that the capacity margin is decreasing every year, and surges in demand or regional weather events are more likely than ever to cause outages around the country.
With utility power disruption likely to become more frequent in the near term, data centre managers should be motivated to design their infrastructure with power availability in mind, including redundant power distribution and generation systems to protect against system failure in the face of commercial power interruptions.
Clearly, you need to design your infrastructure to be as efficient as possible (even taking steps like specifying high-efficiency power supplies in servers). But the degree to which you can green your power distribution infrastructure will depend upon the value of continuous availability to your organization and the costs of expanding capacity. For example, our supercomputing mission at the ERDC requires very robust availability of our computers. Our electrical distribution infrastructure features redundant switches, batteries and electrical generators. These enable us to perform routine maintenance without exposing operations to an interruption, as well as to continue emergency operations over long periods during which a failure in one of these components might be expected. This increases our fixed electrical losses, but is unavoidable given our operational requirements.
3. Improve flexibility by designing for closely coupled cooling. Computers are very efficient at two things: crunching numbers and turning electricity into heat. About 30 percent of the power that goes into the data centre is turned into heat inside servers.
The traditional approach to cooling puts large chillers outside the facility to cool water, which is then pumped to computer room air-conditioning (CRAC) units on machine room floors. This approach essentially floods the entire room with cold air and offers very little flexibility for targeting specific hot spots.
The concept of "closely coupled cooling" has moved in and out of fashion over the years in supercomputing centres; we have found it to be efficient and effective. The idea is to put cooling very near the source of the heat it is meant to remove. This approach allows for targeted cooling and control of hot spots, and can result in shorter air paths that require less fan power to move the cold air around the room. Closely coupled cooling can allow for rack densities of up to four times the density of a typical room-oriented cooling solution. As customer demand pushes rack densities up, all the major server vendors are now offering closely coupled configurations.
There are many rack- and chip-based closely coupled cooling solutions. For example, there are designs that install cooling in a rack form factor alongside server racks, or place it at the top of each rack for a "top to bottom" approach. There are also solutions that deliver chilled water directly to the rear door of racks, or interleave coolers in drawers inside racks alternating with drawers of computers.
Chip-based cooling solutions come in two basic varieties. The simplest deliver cool water to one or more radiators located over heat sources in your server. More complex systems use inert liquid that is directly applied to server chips in a closed loop system. Although this technology has only recently been adopted for commodity servers, the supercomputing industry has been using this technique for decades. The ERDC supercomputing centre was using chip-level vaporization heat exchange in some of its Cray supercomputers last year.
All of these options require plumbing for chilled water right to the computer racks, and you need to plan for this as you are designing your data centre plumbing. If the thought of moving water into the heart of your data centre causes your heart to skip a beat, fear not: There is a large body of engineering knowledge about how to minimize the risks. You'll want to take steps like keeping your water pipes a low as possible under your raised floor, installing leak detectors, isolating electrical runs from plumbing pipes, and providing leak-containment features like gravity drains and drip pans.
4. Think about the floor tiles: It's the little stuff that matters. If you are not planning or cannot plan for closely coupled cooling, there are still steps you can take to improve the efficiency of your cooling.
Plan to minimise the profile of cables and pipes you put under the raised floor in your machine room. This is the space that your CRAC units are using to push cold air toward your computers, and the effectiveness of energy used in cooling can be greatly increased if you can minimize the interruptions that air encounters. Minimizing under-floor obstructions can also help eliminate data centre hot spots and prevent air handlers from working against one another.
Another step you can take is to commission a fluid dynamics study of your data centre or buy the software you need to perform that study yourself. This approach uses a computer model to simulate the flow of air around your data centre and can help you identify the causes of and solutions to cooling problems, including the optimum placement of perforated floor tiles.
The ERDC supercomputing centre adopted this approach several years ago to make sure we were getting the most out of our cooling. Perforated tiles are often simply lined up in front of the racks on the cold aisle of servers in a data centre. "Surprisingly," says Paula Lindsey, integration lead for the centre, "the most effective placement of perforated tiles isn't always right in front of the machine." The fluid dynamics study showed that we needed to increase the diameter of perforations in some tiles, and concentrate additional courses of perforated tile in critical areas.
5. Move support equipment outside. Properly siting your computer infrastructure support systems will improve efficiency and make it easier for you to expand capacity in the future. One of the most important steps you can take is to move as much of your power and cooling equipment as possible out of your data centre. In fact, if you have the space, a good solution is to move the bulk of these items outside the building.
Here's an example. When we needed a short-term fix to get 2 megawatts of additional power for a new supercomputer at ERDC, we found that we needed to add UPS and generator equipment that would not fit into the building that housed the rest of the electrical distribution infrastructure. This problem was compounded by the siting of the building 10 years ago between the foot of a steep hill and a road. The solution-to put the equipment outside in an area created by cutting into the hill-was expensive and added time delays to an already tight schedule.
Our new long-term design places most of those components outside the building in modular units in a newly created utility field. "This move eliminates the constraints that building walls place on us when we need to increase our capacity and should give us the flexibility we need for at least another decade," says Greg Rottman, the engineer in charge of implementing the upgrade.
Moving distribution and support equipment outside is also eco-friendly. In a report published earlier this year, The Green Grid found that as much as 25 percent of the electricity going into the data centre is converted to heat in power distribution units, UPS equipment and switchgear. Moving this equipment out of the data centre, and outside the building if possible, decreases your overall energy consumption by eliminating the need to remove the heat generated by these components.
6. Monitor for power management. Do you know how much power you are using? Are your servers pulling more or less electricity than the vendor specs say they should be? How close to your facility's power capacity will that next machine upgrade put you?
An infrastructure monitoring system for the power and cooling systems needs to be part of any upgrade you are planning. Actively managing and monitoring your energy usage will help you plan for the future and assess the effectiveness of steps you take to improve your data centre's efficiency.
Convincing senior managers to fund data centre improvements not directly related to business delivery can be a challenge. You may have to build your monitoring system piece by piece as you can afford it. But it makes sense to add your power monitoring to your data centre before you undertake major changes designed to save energy and improve efficiency. This will allow you to establish a meaningful baseline from which to judge the effectiveness of your changes, and more effectively plan for the future.
John West is the director of the Department of Defense High Performance Computing Center at the US Army Engineer Research and Development Center in Vicksburg, Miss. He is the author of The Only Trait of a Leader, a field guide to success for new engineers, scientists and technologists.
Join the CIO New Zealand group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.