The impact of AI on data centers and power designers
The development of AI
Broadly speaking, AI is about simulating intelligent behavior in machines. Colloquially, the term artificial intelligence is applied when a machine mimics cognitive functions that humans associate with other human minds, such as learning and problem solving.
AI, as a concept, has been around since the development of the electronic computer. It was first coined as a term in the 1950s, and led to the development of computer programs deliberately engineered to mimic the problem-solving skills of a human being. However, despite a flurry of research activity in the 1960s and 1970s, hardware and software limitations meant these early computer programs could only perform a narrow set of tasks, often still needing human intervention.
More recently though, advances in computational power have re-ignited interest in AI, and most of the world's biggest companies now have specific research departments devoted to furthering the concept – from driverless cars, to real-time voice translation services, to voice-activated intelligent assistants like Siri and Alexa.
Many of these new AI-based products and services rely heavily on the cloud. AI can be extremely compute-intensive, and the local or edge devices rarely have enough performance to handle everything on their own. For example, your smart speaker can probably recognize the wake word, such as "Alexa", locally. However, more sophisticated sentences and queries need to be transmitted across the internet for processing in the cloud at server level to formulate an analytical response.
Similarly, in B2B applications, locally gathered data will typically need to be sent to servers, either centrally for an individual company, or located in a cloud data center. For example, a factory may have hundreds of sensors gathering data that can be used for tracking efficiency and for predictive maintenance – with all the AI processing done remotely in Amazon's AWS cloud for example. AI is also helping make our cities smart – controlling traffic signals to help manage the smooth flow of cars, for instance – and enabling smart buildings, with AI-controlled heating, lighting and air conditioning.
Energy requirements for AI in data centers
Data centers use huge amounts of electricity – to power the IT equipment itself, but also to run cooling and air conditioning systems. Estimates vary for data center usage as a percentage of the global total demand for electricity, but a recent report from Yale Environment 360 put the figure at 2%, with a surge in internet traffic and video conferencing due to COVID-19 pushing this number upwards. The report also predicts internet traffic will have doubled from 2020 to 2022, with a similar rise in IoT connections – all driving exponential demand for data center services.
Due to efficiency improvements, all of this extra activity is not, in fact, leading to a big increase in power usage, which is expected to stay flat at least in the near future. As well as improvements in efficiency for IT equipment, this is also being helped by a transition from smaller data centers to large-scale cloud data centers, and by power savings due to server virtualization. Data center efficiency is usually measured by a figure called power usage effectiveness (PUE), which is the ratio of total power used by the data center versus the power used for computation, and this figure has been driven down from around 2.0 to typically 1.2 for large, modern sites.
What about AI? As you might expect, it’s pushing the electricity usage ever upwards. Industry body AFCOM estimates that an average rack in a data center uses around 7kW of power, while AI applications commonly use more than 30kW. That’s quite a jump, and it remains to be seen if the efficiencies mentioned above will continue to keep overall demand constrained, as AI usage increases.
This increase in power usage for AI is in large part due to its demands for lots and lots of processing performance, with high-speed CPUs and GPUs. Every time we ask our smart speakers to answer a question, or YouTube recommends a new video for us, these processors need to kick into action to provide the AI behind these services.
How power systems are keeping up
Any improvement in efficiency of the power supplies provides a double benefit, both in reducing the electricity used by the IT kit, and also reducing how much cooling is needed. With increasing pressure on companies to reduce their carbon emissions, it’s not just about money – it’s essential to reduce power consumption as far as possible for environmental reasons too.
In addition, as the rack power level moves from <10kW to >30kW for compute intensive AI, 12V is transitioning to 48V as the preferred DC distribution voltage within the rack. The improved I2R losses in the distribution network typically exceed the additional board level 48V to 12V conversion loss for a net system efficiency gain, but a 48V distribution increases the need for high efficiency 48V conversion.
For general data center loads, such as storage and CPUs in servers, there’s a need for a 12V regulated rail, which can deliver ever-increasing levels of power to meet the increasing demands of high-performance IT kit. This means that power systems need to maximize power density, so everything can fit into the smallest space possible, while maintaining efficiency – thus keeping overall costs lower.
To achieve this kind of high efficiency and density, without compromising on thermal performance, new approaches are needed. For instance, Flex Power Modules has developed Hybrid Regulated Ratio (HRR) converters, which combine the benefits of fixed ratio DC/DC conversion with those of full regulation – enabling digital DC/DC converters such as Flex Power Modules' BMR491 to achieve efficiencies of 98% and above, while delivering up to 2.4 kW of peak power in a quarter brick package.
Another trend that is helping reduce power system losses over the past few years is elimination of the isolation requirement for 48V power supplies in data center racks. High efficiency Intermediate Bus Converters (IBCs) such as Flex Power Modules' BMR310 Switched Capacitance Converter (SCC) is a non-isolated topology providing a 12V supply locally from the 48V supply where required. For AI applications, and specifically high-performance GPUs, SCCs provide an unregulated and non-isolated supply, therefore enabling very high efficiency, very low height, and a compact overall size.
The 48V to intermediate bus voltage conversion then needs a final conversion stage for the load. Today’s IT boards are complex and diverse functional blocks that often have different power requirements. Point of Load (PoL) converters, such as Flex Power Modules' BMR474, can take the 12V input, and provide the low voltage outputs for individual rails and be optimized for each rail using the PMBus configuration flexibility. Then there is a growing need for very high currents for the workhorse processors that are still best served using multi-phase buck converters. The BMR510 for example is a two-phase voltage regulator module (VRM) which operates on an input range of 4.5-16 V, and is designed for demanding high power applications such as CPUs and GPUs. This power module incorporates 2 buck power stages and each phase can deliver up to 70 A peak per phase. Multiple 2-phase modules can be combined to provide a multi-phase system with the appropriate number of phases needed for the total thermal and peak current requirements. The BMR510 is a stacked design mounting 2 integrated power stages on top of the inductor providing a PCB space efficient module that is optimized for top side cooling.
AI has moved quickly from concept to reality, and is now a familiar, widely-deployed technology. While overall data center power consumption is staying flat, due to improved efficiency, there’s potential for AI to drive electricity usage up at a rapid pace.
Power system designers need to be constantly aiming to reduce losses in their power sub-systems, to keep costs and carbon emissions under control – and they need to be on the lookout for new technologies and components that will help them hit that goal.