Breakthrough CXL memory solution for AI workloads

Breakthrough CXL memory solution for AI workloads Breakthrough CXL memory solution for AI workloads

As enterprises accelerate adoption of large language models (LLMs), Generative AI, and real-time inference applications, a new bottleneck has emerged: memory scale, bandwidth, and latency.

XConn Technologies and MemVerge have announced a joint demonstration of a Compute Express Link (CXL) memory pool designed to break through the AI memory wall. The live demo will take place at Supercomputing 2025 (SC25) in St. Louis, 16-21st November 2025 in booth #817 station 2 and 8.

Academic and industry analysts agree that memory bandwidth growth has lagged far behind compute performance. While server FLOPS have surged, DRAM and interconnect bandwidth have scaled much more slowly, making memory the dominant bottleneck for many AI inference workloads. Experts warn that AI growth is already hitting a memory wall, forcing a rapid need for memory and interconnect architectures to evolve. The memory-intensive nature of retrieval-augmented generation, vector search, agentic AI, and LLM inference is pushing traditional DDR and HBM-based server architectures to their limits, creating both performance and TCO challenges.

“As AI workloads and model sizes explode, the limiting factor is no longer just GPU count; it’s how much memory can be shared, how fast it can be accessed, and how cost-efficiently it can scale,” said Gerry Fan, CEO of XConn Technologies. “Our collaboration with MemVerge demonstrates that CXL memory pooling at 100TiB and beyond is production-ready, not theoretical. This is the architecture that makes large-scale AI inference truly feasible.”

To address these challenges, XConn and MemVerge are demonstrating a rack-scale CXL memory pooling solution built around XConn’s Apollo hybrid CXL/PCIe switch and MemVerge’s Gismo technology, optimised for NVIDIA’s Dynamo architecture and NIXL software stack. The demo showcases how AI inference workloads can offload and share massive KV cache resources dynamically across GPUs and CPUs, achieving greater than 5× performance improvements compared with SSD-based caching or RMDA-based KV cache offloading, while reducing total cost of ownership. The demo particularly shows a scalable memory architecture for AI inference workloads where there is a disaggregation of prefill and decode work stages.

“Memory has become the new frontier of AI infrastructure innovation,” said Charles Fan, CEO and co-founder of MemVerge. “By using MemVerge GISMO with XConn’s Apollo switch, we’re showcasing software-defined, elastic CXL memory that delivers the performance and flexibility needed to power the next wave of agentic AI and hyperscale inference. Together, we’re redefining how memory is provisioned and utilised in AI data centres.”

As AI becomes increasingly data-centric and memory-bound, rather than compute-bound, traditional server architectures can no longer keep up. CXL memory pooling addresses these limitations by enabling dynamic, low-latency memory sharing across CPUs, GPUs, and accelerators. It scales up to hundreds of terabytes of shared memory, reduces TCO through better utilisation, reduces over-provisioning, and enhances throughput for inference-first workloads, Generative AI, real-time analytics, and in-memory databases.

SC25 attendees can experience the joint demo featuring a CXL memory pool dynamically shared across CPUs and GPUs, with inferencing benchmarks illustrating significant performance and efficiency gains for KV cache offload and AI model execution.

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Previous Post
Radiation-tolerant chips to power next gen of satellite constellations

Radiation-tolerant chips to power next gen of satellite constellations

Next Post
Chip-to-vehicle validation accelerated using digital twin technology

Chip-to-vehicle validation accelerated using digital twin technology