The transformation in data centre power architectures

As AI usage expands exponentially, the hardware that supports it is as critical to the AI revolution as the large language models (LLMs) themselves. While there are different kinds of data centres, from giant hyperscale projects to edge installations, all data centre operators are focused on the same things: quickly deploying AI, delivering and using power efficiently, and future-proofing IT infrastructure so it is prepared to support the needs of AI chips, graphics processing units (GPUs) and tensor processing units (TPUs).

Power and cooling requirements

The scale of technological change happening in data centres is immense. For context, estimates put average pre-AI non-high-performance computing (HPC) data centre rack power at about 8kW per rack. Today, the industry is pushing toward one-to-three-megawatt racks, more than 100 times the amount of power.

As AI applications are being deployed across most industries, supporting AI technology is a necessary total paradigm shift.

Cooling reference architectures

With the sheer demand on power, liquid cooling is imperative for AI data centres, as air-based cooling technology is insufficient to moderate the GPUs. To successfully implement suitable solutions the industry needs a flexible and collaborative framework to develop AI cooling infrastructure standards. Moving toward common infrastructure standards and an interoperable model within the data centre industry will help companies accelerate development and continue to develop unique innovations.

Modularity and standard interfaces allow data centres to deploy technologies faster, providing a foundation upon which infrastructure providers can innovate for efficiency and performance. Reference architectures will still allow for unique and differentiated designs in coolant distribution units (CDUs), rear door coolers, heat rejection units and manifolds, and technology cooling systems while providing compatibility and interface standardisation, which will allow mixing and matching of products from multiple suppliers.

The liquid cooling architecture cooling today’s high-density racks is a closed loop system that uses a treated fluid that is continuously re-circulated. By leveraging conduction over convection, direct-to-chip cooling prevents evaporative loss, significantly outperforming air-cooling efficiency. This architecture not only optimises thermal management but also provides high-grade waste heat, creating viable opportunities for heat reuse applications across the data centre.

High-voltage DC power

For megawatt-scale rack power delivery, the industry is moving to 800-volt direct current (VDC) power distribution and potentially 1500VDC in the long term.

This shift in power delivery architecture can offer many benefits, including reducing copper usage and minimising resistive losses. Bringing DC power all the way to racks also reduces the number of AC/DC conversions in a data centre. This can make installation easier and lower costs for data centre operators, in addition to minimising power losses, which are inherent any time power is converted from one form to another.

Moving to DC power can also add simplicity to complex data centre construction projects because electricity does not need to be converted and reconverted several times on its way to IT racks. This shift can also enhance the scalability of data centres because power distribution infrastructure doesn’t need to be redesigned when IT infrastructure is added, just expanded spatially to include additional racks.

However, data centre operators need to understand that power products like AC to DC converters, busbars and busways, rack power delivery and protection and monitoring solutions will need to be redesigned from how they look today to fit into a DC power architecture.

Power and cooling convergence

Cooling and power infrastructure in data centres need to work together in tandem. To achieve maximum output efficiency, GPUs need the right amount of power, and the resultant heat energy needs to be appropriately dissipated by cooling infrastructure to keep these chips within their thermal operating parameters.

Intelligently optimising cooling and power is where data centre operators can drive real results in efficiency and performance. Given the massive increase in rack power and the resulting transients, a connective framework, advanced control algorithms and a software management layer between the IT equipment and the power and cooling infrastructure are what we believe will become imperative to managing the infrastructure efficiently and safely.

AI’s next phase will be won or lost in the physical layer. Megawatt-class racks demand liquid cooling built on interoperable reference architectures, and high-voltage DC distribution to reduce losses, materials and complexity. The opportunity now is to converge power, cooling and software controls to scale efficiently, safely and predictably.

About the author: