Hyperscalers’ data centre buildouts and the corresponding demands on the power grid at the macro level garner most of the media attention, but AI electricity demands create challenges for enterprise infra and operations (I&O) teams running on-premises and colocation systems as well.
Constraints on power supply for existing data centres – for which they dedicated buildings, floors, or rooms – have become the gating factor on scaling AI compute capacity. I&O leaders must find efficiencies to eke out every compute cycle per watt of power supply. Storage and memory architectures are crucial keys in unlocking infrastructure efficiency.
1) Legacy storage is leaking watts – and time
AI pipelines are brutally sensitive to data movement. Training checkpoints, vector stores, and retrieval-augmented inference unleash torrents of tiny, random I/Os that don’t align with storage stacks tuned for 4-128KB blocks. The result is tail-latency spikes that stall accelerators, waste energy, and inflate cost per token or per training step.
Two structural issues magnify the problem:
- First, the memory wall: compute performance has far outpaced DRAM bandwidth and latency, leaving expensive cores waiting on data
- Second, scaling memory often means adding CPU sockets, stranding DRAM capacity, and consuming more power just to push bytes around. When local NVMe is underutilised – or the I/O size AI wants (sub-4 KB) collides with what SSDs are optimised for – clusters push bursts across the network, burning additional watts for no added insight
2) Smarter storage minimises data movement – and power
- Treat data ‘hotness’ as a first-class signal: large operators separate hot, warm, and cold data to reduce needless data movement. Facebook/Meta’s f4 system, for example, keeps warm BLOBs on erasure-coded storage to cut capacity overhead while meeting latency targets – an early blueprint for energy-aware tiering. Meta’s later Kangaroo flash cache shows how designs optimised for tiny objects reduce DRAM, cut flash writes, and improve hit ratios – turning I/O locality into fewer joules per query. New memory and storage technologies continue to hit the market, providing I&O teams with more tiers in the Memory-Storage Hierarchy to optimise their data placement for power efficiency
- Embed compute where the data lives: offloading repetitive, byte-level work – transparent compression, deduplication, data integrity checking, metadata handling – into SSD controller state machines reduces host cycles, decreases data movement, and slashes write amplification. The Storage Networking Industry Association now standardises computational-storage models to enable vendor-neutral adoption across platforms
- Optimise SSDs for AI’s fine-grain I/O: controllers that accelerate 512 B-4KB accesses and perform inline compression can lift effective IOPS ceilings, smooth checkpoint storms, and increase tokens-per-watt without kernel forks
- Leverage ultra-dense SSDs in shared storage: SSD capacities have surpassed HDDs, with recent announcements of 128TB capacities in a single drive and higher densities on the horizon. While the orders of magnitude higher IOPS of SSDs were a core driver for their adoption in high-performance applications, capacity per watt is the primary driver in this emerging tier. These ultra-dense drives enable users to pack petabytes per rack unit, shrinking rack count and power distribution overheads. In addition, they still provide orders of magnitude higher performance per TB than HDDs and avoid the vibration and noise of rotating media
- Scale memory without more sockets: Compute Express Link (CXL) enables pooled and tiered memory over abundant PCIe lanes, reducing overprovisioning, avoiding stranded capacity, and improving utilisation. With robust ECC on CXL DIMMs/modules, platforms can increase addressable capacity at near-DRAM latencies while containing cost and power. These capabilities pair naturally with near-GPU NVMe to keep more hot and warm data close to accelerators, thus avoiding power-sapping and time-consuming pulls from colder tiers
- Engineer for energy proportionality: the goal is simple: do more work per watt. Prioritise designs that keep GPUs fed, collapse copies between tiers, and cut tail latency – the 95th/99th-percentile events that silently waste power and wall-clock time. GPUs and CPUs consume the lion’s share of power, with storage accounting for only ~5% of energy consumption. However, storage system performance and intelligent data placement play a critical role in turning that processor power consumption into useful work and avoiding wasted cycles and process stalls/failures
A practical checklist for design & procurement
- Instrument for stall time: track GPU idle versus storage/memory events; pay attention to tails, not just averages. Tie purchases to performance-per-watt under bursty checkpoints and RAG-style reads
- Adopt computational storage where writes are heavier: inline compression and write-reduction at the device cut joules, latency, and flash wear with minimal host changes
- Right-size memory with CXL: pilot pooled/tiered memory with advanced ECC to increase capacity without socket creep; measure batch size, spill behaviour, and job completion energy
- Design for data temperature: use policy-driven placement – hot near compute, warm on high-IOPS SSDs, cold on dense tiers – to reduce bytes-moved per inference or training step. Gartner’s recent Hype Cycle work highlights “integrated data intelligence,” “autonomous storage,” and “platforms for GenAI” as rising focus areas
Bottom line
Electricity is the new constraint, and data movement is the silent tax. By evolving storage from a passive byte bucket to an active optimizer – through inline compression, write-reduction, AI-tuned controllers, ultra-dense media, and CXL-based memory tiers – I&O teams can deliver more tokens and queries per watt while meeting growing scrutiny from investors, regulators, and communities. The pay-off is immediate: fewer stalls, fewer copies, fewer watts – at scale.
About the author:

JB Baker, the Vice President of Products at ScaleFlux, is a successful technology business leader with a 20+ year track record of driving top and bottom-line growth through new products for enterprise and data centre storage. After gaining extensive experience in enterprise data storage and Flash technologies with Intel, LSI, and Seagate, he joined ScaleFlux in 2018 to lead Product Planning & Marketing as the company innovates efficiencies for the data pipeline. He earned his BA from Harvard and his MBA from Cornell’s Johnson School.