AI

Empowering the data centre for AI workloads

03 June 2025
4 minutes
Arm's Eddie Ramirez explores how the right hardware can augment the entire AI stack
Eddie Ramirez, vice president of go-to-market, Infrastructure Line of Business at Arm, superimposed in front of the company's logo and next to a STM32 microchip held in tweezers
Eddie Ramirez, vice president of go-to-market, Infrastructure Line of Business at Arm, superimposed in front of the company's logo and next to a STM32 microchip held in tweezers

As AI workloads become more diverse, no single computing solution can meet every challenge. Data centres must manage escalating power demands, rising data volumes, and the push for sustainable operations, all while keeping pace with AI’s rapid evolution.

Meeting these demands requires a distributed approach that bridges the cloud and edge, using CPUs, GPUs, and NPUs in tandem to deliver flexibility and performance.

Assigning different workloads to processors designed for specific tasks improves both performance and energy efficiency. By enabling accelerated compute—where CPUs and accelerators are co-packaged or connected with high-bandwidth, memory-coherent links—technology partners can build customised silicon to meet the specific needs of AI workloads.

Tightly coupled CPU compute is key to the AI stack, and I’ve seen how transformative it can be for developers. Nvidia’s Grace Hopper and Grace Blackwell systems demonstrate this well, using Arm CPUs to reduce latency between CPU and GPU.

The result: 30x higher GPU performance and 25x lower energy use compared to H100 GPUs. This coupling provides the accessibility, programmability, and flexibility needed to build and scale AI applications efficiently.

CPUs are essential—not just for inference, but for data pre-processing, orchestration, and managing software across diverse data formats. Their flexibility means developers don’t need multiple code versions, and their global availability, like AWS’s Graviton, makes them a natural fit for accelerated AI workloads.

The new Graviton4, for instance, supports generative AI applications such as chatbots with high performance at low cost.

From embedded devices to massive data centres, CPUs serve as the backbone for AI, enabling silicon customisation and tighter integration with accelerators. This is ushering in a new era of scalable, high-performance AI across the cloud and edge.

The shift toward chiplets — where silicon is modularised into smaller, specialised components—is transforming the landscape of server CPUs, DPUs, and AI servers. This design approach makes systems more economically viable and adaptable.

Innovation thrives by building a collaborative ecosystem that brings together technologies, standards, and protocols. This approach leverages the UCIe physical interface to drive chiplet adoption and pave the way for the future of computing.

Addressing the challenges of custom silicon development requires building a partner ecosystem dedicated to driving innovation for the data centres of the future.

I’m excited about the progress our industry has made, transforming into a hub of collaboration and efficiency. The goal is simple but impactful: to enable seamless integration of chiplets from diverse partners, creating interoperable, reusable components that lower costs and open new doors for the entire industry.

Standardisation focuses on developing common interfaces and validating IPs to ensure interoperability and readiness for widespread adoption. This includes advancing efforts such as proofs of concept, test chips, and fully product-ready chiplets.

Third-party IP vendors are also a critical part of this ecosystem to deliver complementary IPs, such as the UCIe interface that connects chiplets. Aligning roadmaps and pooling expertise is establishing a unified approach that’s driving meaningful progress across the industry.

One strong example of this in action is a collaboration in South Korea, where a three-way partnership is building an AI CPU chiplet platform.

ADTechnology provides the Arm compute chiplet, Samsung Foundry supplies an advanced 2nm process, and Rebellions contributes an AI accelerator targeting 2–3x the efficiency of current GPUs for GenAI tasks. The chiplets are tightly linked via an AMBA CHI C2C interconnect, guided by Arm’s Chiplet System Architecture (CSA) specification.

In the rapidly evolving AI landscape, efficiency and design flexibility are essential. Building a foundation for sustainable AI data centres is key to accelerating the development of chiplet solutions.

This system-level approach lowers the barrier to entry, encourages innovation, and enables faster, more secure AI—from cloud to edge.

This article first appeared in Datacloud Magazine – June 2025

Eddie Ramirez is the vice president of go-to-market, Infrastructure Line of Business at Arm