Amid one of the most severe DRAM shortages in years – with prices up nearly 50% and high-capacity modules facing long lead times – many AI developers are being forced to rethink how they build and deploy models. One emerging solution: shifting more intelligence to the Edge.
Lightweight, efficient models, like IBM’s Granite Nano series, are proving they can deliver strong performance within 1-2GB of memory, allowing companies to avoid inflated DRAM costs while improving reliability, latency, and privacy. Paired with specialised Edge accelerators like those from Hailo, these compact models can support tasks such as transcription, summarisation, and image enhancement without relying on Cloud capacity or expensive high-capacity DRAM.
I spoke with Avi Baum, CTO of Hailo to discuss how companies are using Edge AI to “do more with less” and build more resilient, cost-efficient systems.
How have the current DRAM shortages and rising prices affected AI deployment strategies for enterprises?
The DRAM shortage has become an architectural constraint, pushing enterprises to rethink workloads that depend on large memory footprints. Higher-capacity modules (4-16GB) are seeing the steepest price spikes and longest lead times, with costs rising to 200% and even hyperscalers receiving only about 70% of their allocated volumes.
This scarcity makes Cloud-scale, memory-intensive AI models more expensive and harder to procure. In contrast, lower-capacity modules (1-2GB) remain available and more stable, which is encouraging teams to design or adopt smaller, domain-specific models that fit within modest memory baselines.
As a result, enterprises are shifting toward leaner AI architectures, including compact small language models (SLMs) and compact vision language models (VLMs), and increasingly evaluating Edge-first or hybrid deployment approaches that avoid dependence on high-capacity DRAM and reduce supply chain risk.
How does processing AI workloads on the Edge impact latency, privacy, and system reliability compared with Cloud-based solutions?
Running AI on the Edge keeps interactions quick and consistent because the computation happens locally rather than traveling back and forth to the Cloud. These systems have become part of how people move through their day, whether recapping conversations, translating speech or refining audio and images. As expectations shift toward instant, uninterrupted performance, local processing has become a more reliable way to deliver the speed and consistency they expect.
On-device AI also changes how privacy is protected. When processing happens on the device or within a nearby gateway, the data involved is not sent to a remote server for interpretation. And as generative features become part of routine workflows, keeping information close to where it is created has become a meaningful advantage for both users and the organisations deploying these tools.
A third benefit is increased reliability. The Cloud outages that struck major providers in 2025 demonstrated how quickly remote dependency can weaken an entire AI workflow, with consumer and enterprise functions dropping offline in a single stroke. As AI becomes embedded in the tasks people perform repeatedly throughout the day, even short disruptions feel amplified. Moving the most frequently used intelligence onto the device or a nearby gateway removes that fragility and keeps core capabilities available regardless of the broader Cloud environment. The result is a system that feels faster, exposes less data and weathers disruptions far better than Cloud-only alternatives.
How do you see the role of lightweight AI models, such as IBM’s Granite Nano series, in enabling Edge deployments with limited memory?
Lightweight AI models help make Edge deployment practical by significantly reducing memory and compute requirements. While there is always a tradeoff in model size versus accuracy and capability, these smaller models have become increasingly accurate, enabling tasks that once required large, Cloud-hosted models to run locally on constrained devices. This shift allows Edge systems to handle more language understanding, reasoning, and control directly on-device, improving responsiveness and reducing reliance on the Cloud. As these compact models continue to improve, they open the door for a broader range of AI applications to move to the Edge.
What trade-offs do companies face when reducing model size to fit within 1-2GB of memory?
The tradeoffs typically involve a smaller dataset, lower degree of reasoning freedom, smaller context windows, or reduced generality compared to large Cloud models. However, many real-world tasks, such as controlling devices by learning their operation manual, summarising calls or video streams, translation, and others, don’t require a full-scale LLM. Lightweight models make strong business sense in scenarios where latency, cost, privacy, or reliability are critical, such as smart home devices, industrial controllers, cameras, and offline environments. As these compact models continue to improve, more AI workloads will naturally shift to the Edge, where they can operate efficiently, privately, and at lower cost.
Furthermore, in practice, real-life application complexities lean towards hybrid solutions, where some of the workload is offloaded to the Edge while some remain in the Cloud.
How do Hailo accelerators optimise performance for these compact models and what kinds of applications are your customers running?
Hailo accelerators are designed to process AI workloads at high efficiency and low power consumption. The unique dataflow architecture allows optimised utilisation of compute and memory resources, enabling SLMs to run efficiently locally on Edge devices.

Customers are using Hailo accelerators for a wide range of Edge applications that benefit from running advanced AI locally. Hailo has hundreds of customer programmes and products in deployment in industries ranging security and surveillance, retail, healthcare, automotive, robotics, and more. We are just starting to see customers deploying Generative AI into real-world products, mainly around AI agents and assistants of all kinds.
How do you expect Edge AI adoption to evolve over the next few years in response to memory supply constraints?
We expect Edge AI adoption to accelerate not just because of memory constraints. From our perspective at Hailo, we are looking at a broader architectural inflection point. As Generative AI applications become a commodity, organisations and consumers are prioritising reliability, availability, latency, and privacy in ways that favour local processing, and they are increasingly designing for efficiency rather than building around assumptions of unconstrained resources.
The ongoing pressure on high-capacity DRAM is pushing teams to rethink the assumption that larger models must run in centralised environments and instead evaluate architectures that can operate efficiently on more predictable, readily available memory tiers. As compact SLMs and VLMs mature, they will make it increasingly easier for developers to shift AI tasks out of the cloud and onto everyday Edge devices.
Are there any upcoming innovations in Hailo’s technology that will make it easier for companies to “do more with less”?
In 2026 we’re looking forward to announcing more partner integrations across a variety of diverse industries and sectors as well as new products and solutions that make AI more accessible and affordable for everyone.

By Paige Hookway, Managing Editor, Electronic Specifier
This article originally appeared in the January’26 magazine issue of Electronic Specifier Design – see ES’s Magazine Archives for more featured publications.
