What's driving the need for peak power in data centers?

Blog • February 03, 2022

The high clock speed of latest CPUs and GPUs is the enabler for intensive processing tasks such as AI and gaming. This comes at a price though; every clock edge produces a spike of current demand from the power supply to the chip and faster clocks mean more spikes per second and a higher average value. Voltage rails are kept as low as possible to minimize consequent power draw, but this is a trade-off with speed; lower supply voltages slow the slew rate of logic transitions throttling back maximum speed attainable.

The problem of controlling clock speed for best performance has been around since the earliest days of computing – when the original 8086 processor running at 4.77MHz was superseded by faster CPUs, a ‘turbo’ button had to be introduced to slow down the clock, so legacy software of the time could cope. Today, clock speed is dynamically varied, mainly for maximum functional and thermal performance, with sophisticated algorithms monitoring workload, number of active cores, processor temperature and estimation of actual current draw and power consumption. Voltage rails are no longer necessarily a fixed value but can be varied under control of the processor using ‘Dynamic Voltage Scaling’ (DVS) to achieve an optimum combination of speed and watts consumed without exceeding critical chip temperatures. Processor vendors define ‘Thermal Design Power’ or TDP as the maximum power that can be continuously dissipated for a specific cooling arrangement and ‘ACP’, the Average CPU Power, which is an estimate of real-world conditions. TDP is typically 50% higher than ACP, but peak power demand for shorter periods can be 50% higher still. During these peaks, chip temperatures could exceed absolute maximums so the period is limited to much less than one second with a controlled repetition rate to keep temperatures within bounds.

The goal of power rail management for processors is to maximize performance for the end user, but energy savings and system cost are important too. Data center power consumption has surged recently with increased video streaming and conferencing, social networking, AI, machine learning and the IoT. This trend is set to accelerate with real-time data in the global ‘datasphere’ predicted to increase ten-fold from 2018 levels by 2025 to 51 zettabytes [1]. To support this, the International Energy Agency estimates that energy consumed in data centers in 2020 was about 1% of global supply, or 200-250 TWh, and today, the figure maybe be double that and rising. Given this background, and as the peak to average power demand ratio also increases with attempts to optimize performance, the scale and costs of the power supply and distribution system in data centers spirals as well. The increase is also perhaps disproportionate if the power system has to be ‘continuously’ rated for the peak load, representing a major financial and space burden.

Power converters with surge ratings mitigate problems

A mitigating strategy is to employ power converters that are rated for the average load demand continuously, but which can also supply peak or surge power for a limited time. Converters, like processors, have a ‘Thermal Design Power’ set by maximum internal temperatures, but can be designed to be capable of supplying more power transiently, depending on the starting temperature. In this strategy, each successive power conversion stage going upstream from point of load converters through bus converters to AC/DC and power factor correction stages must have a similar surge power rating. At the low processor voltages and even at 5V or 12V intermediate bus voltages, adding capacitive energy storage to supply the required peak demand is not practical due to physical size.

An advantage of the scheme is that converters can be physically smaller and lower cost and can be chosen for optimum efficiency at the average load power for lowest overall losses over time. The design of converters with a peak power rating is however more difficult - internal hotspots must be accurately identified and monitored during surges to ensure they do not exceed limits and reduce part reliability. Also, independent over-temperature shutdown must be incorporated to cover fault conditions, not least those deliberately imposed by safety agency tests where the worst achievable combination of input voltage, output current and ambient temperature will be explored. Digital control and monitoring of converters, particularly PoLs with peak power rating, is ideal, to provide the dynamic voltage scaling capability while communicating current and temperature readings to a power management controller or to the processor. Fast load transient response time of converters is also important to make the best use of DVS and to maintain output voltage within specifications during peak power demand.

A good example of an intermediate bus converter with high surge rating is in the Flex Power Modules BMR491 series with a quarter-brick part rated at 1540W continuous power which can also deliver up to 2450W peak for one second (see figure). Input is 48-60V and output is 12V with a transient response time of 300µs. A PMBus® digital interface provides the necessary control and monitoring, supported by the Flex Power Designer configuration software.

Other examples include the BMR492 and BMR350 series.

The Flex Power Modules BMR491X208/857 bus converter rated at 1540 W continuous, 2450 W peak power

As data center throughput increases and processors use ever-more ingenious ways to save energy, the power supply system must be similarly intelligent. Having a peak power capability and comprehensive digital control makes this a practicality.

Reference

[1] https://www.statista.com/stati...