Designing real-world AI hardware at the Edge

Everyone is selling AI now – from Cloud platforms to AI PCs and Neural Processing Units (NPUs) on every System on Chip (SoC), whether you want it or not – but embedded engineers don’t ship hype, they ship efficient, targeted working systems. The reality on the factory floor, in vehicles or in remote devices is that AI lives or dies on how well the design utilises multiply-accumulate operations (the building block of any AI accelerator solution), memory and milliwatts, not marketing slides.

The numbers tell a quieter, more pragmatic story than the vendor presentations suggest. According to IDC, global Edge computing spend is forecast to reach around $378 billion by 2028, whilst IoT Analytics reports that the installed base of connected IoT devices is expected to climb to 21.1 billion in 2025.

This isn’t a science experiment anymore. Edge AI and embedded intelligence are large, fast-growing markets. The question for engineers isn’t if AI belongs at the Edge, but where and how much.

From AI hype to embedded reality

Strip away the buzzwords and ‘AI hardware’ simply means making trade-offs around processing resources, data movement and storage, and energy. You might be working with a microcontroller (MCU) sporting a digital signal processor (DSP) or Convolutional Neural Network (CNN) accelerator blocks consuming milliwatts, an application SoC with a dedicated NPU, a graphics processing unit (GPU) module, or purpose-built inference accelerators. What matters isn’t the marketing label, it’s whether you can feed the most efficient compute engine for your application with enough data within your power budget.

The benefits of implementing AI on the Edge over sending data to a remote server farm to be processed are clear. Edge AI algorithms execute locally so greatly reduce latency, helping to minimise external data bandwidth requirements and connection costs, as well as meet the growing number of privacy constraints, with resource-constrained hardware as the core assumption. That’s the engineering reality: you’re optimising for constraints, not chasing peak performance.

Performance metrics like Tera-Operations Per Second (TOPS) dominate datasheets, but embedded engineers care far more about TOPS per watt. Even at the lower end, devices like the Silicon Labs PG28 ARM Cortex-M33 MCU, costing just a couple of dollars, now integrate an NPU capable of running AI models eight times faster and at 1/6th of the power of the ARM core itself, demonstrating that embedded intelligence can live in milliwatt-class silicon.

Defining embedded intelligence in hardware terms

For embedded engineers, AI hardware is just another set of trade-offs and bottlenecks to juggle. TOPS on a slide means nothing if you can’t feed the accelerator with enough data or keep it within your power envelope.

The distinction between different hardware approaches matters in practical terms and it helps to think about the dimensionality of your data in your application. One-dimensional AI, which only needs to process a single serial stream of complex data such as audio keyword spotting, vibration analysis for predictive maintenance, or time-series anomaly detection, is less complex and is well suited to microcontrollers with integrated NPU/CNN accelerators operating in the milliwatt range. These are low-cost, always-on, ultra-low-power applications where the signal being processed is a potentially complex waveform changing over time. These applications are more machine learning where the model efficiently identifies the unique patterns in the data, defining its desired outputs.

Video applications introduce a second dimension to the data, greatly increasing the AI processing requirements in applications such as multi-camera vision systems, real-time object detection and visual inspection. These applications require SoCs and SoMs (System on Modules) with dedicated NPUs sitting in the watts range, or even GPU-based edge boxes consuming tens of watts for heavier workloads like AI model creation or dynamic model switching. The compute density and memory bandwidth requirements jump significantly when you’re processing image frames rather than sensor streams.

Understanding these trade-offs in concrete terms – multiply-accumulate operations per second, memory bandwidth, thermal dissipation, interface latency – is what separates specification sheets from shippable products. The hardware exists across a spectrum, and the engineering challenge is matching it to the workload and constraints you actually face.

However, there’s a human dimension too. Not every engineer is keen to hand over decades of domain knowledge to a black-box model. One motor-monitoring specialist put it bluntly: after years building hand-crafted mathematical models of bearing wear and vibration signatures, why would he ‘throw AI at it’ and lose visibility into what the system is actually detecting? That reluctance isn’t Luddism – it’s a legitimate concern about explainability, trust and maintainability. The best embedded AI projects don’t replace engineering judgement; they augment it with tools engineers can understand, validate and trust.

Where AI on the Edge fits (and where it doesn’t)

The art isn’t running everything at the Edge or pushing everything to the Cloud. It’s carving up the problem, so each layer does what it’s best at.

Use the Edge when you must: for latency, bandwidth, privacy, or resilience. Autonomous vehicles, industrial control, and smart healthcare often require millisecond response times and enhanced data security and can’t tolerate Cloud latency or connectivity loss. Sensor suites in autonomous vehicles (cameras, radar, lidar) generate tens of terabytes daily, making Cloud streaming impractical. Processing data close to sensors reduces network traffic and energy whilst keeping sensitive data local for regulatory compliance.

Use the Cloud when you can: for heavy training, fleet-wide retraining, and data aggregation. Hybrid architectures often make the most sense, with lightweight inference at the Edge and periodic model updates handled centrally.

Accuracy, safety and ‘good enough’ intelligence

In embedded and safety-critical systems, ‘good enough’ AI is a very precise phrase. You must know what accuracy, false positives, and false negatives mean in real operational terms – and keep deterministic safety nets in place.

Again, this comes down to the trade-offs you must consider between a larger more accurate model, which can take longer to run and consume more power, and a lighter weight model that may not be as accurate.

Real-world deployment accuracy routinely falls several percentage points below laboratory benchmarks due to noise, occlusion, and domain shift. Quantisation techniques, running models at 8-bit or sub-8-bit precision, can significantly reduce the size and execution time (inference latency) of your model and hence energy consumption whilst keeping accuracy within a few points of full-precision baselines, but only after careful validation.

In safety-critical domains, AI should be integrated as decision support and should not be the only safety mechanism, with continued focus on traceability, explainability and adherence to standards like ISO 26262 or IEC 61508.

Practical hardware and tooling considerations

Don’t pick hardware in a vacuum. Modern Edge AI deployments increasingly rely on heterogeneous compute – that is combinations of central processing unit (CPU), GPU, DSP, and NPU working together, with the scheduler assigning workloads based on efficiency and latency requirements. A dedicated NPU might handle continuous inference tasks to benefit from its efficient parallel processing capabilities, whilst the CPU manages the more general-purpose control logic and the GPU processes occasional heavier vision workloads.

The architectural choice between integrated and add-on accelerators matters too. As mentioned, vendors like Silicon Labs offer microcontrollers and ISM band RF SoCs with built-in AI acceleration suitable for one-dimensional workloads, whilst companies like DeepX provide companion chips and modules that bolt onto existing processors (discretely or via an M.2 plug in card) when you need more inference capability without redesigning your entire platform. Their proprietary ‘Intelligent Quantisation’ (IQ8) techniques help to deliver the performance levels required of modern Edge AI deployments. Each approach has trade-offs: integrated solutions offer tighter power management and lower latency, whilst add-on accelerators provide upgrade paths and flexibility for evolving models.

Intel’s reference blueprint for Core Ultra processors demonstrates heterogeneous compute in practice: NPU and integrated GPU configurations balance multi-stream computer vision and small generative models with significantly better performance per watt than GPU-only setups. However, raw silicon capability is only half the story.

Tooling, software development kits (SDKs) and ecosystem support often matter as much as the chip specification. Can you easily quantise and optimise your models for the target hardware? Is there robust debugging support? What’s the long-term roadmap?

The skills gap is real: traditional embedded teams often lack experience in data management, model generation and optimisation techniques like quantisation and pruning, and efficient inference deployment on constrained devices. A slightly smaller NPU with a great toolchain and strong community support often beats a monster chip with a poor software ecosystem.

Start with questions, not chips

The winning teams aren’t the ones with the biggest NPU; they’re the ones who frame the right problem, own their data, develop or select realistic models and choose hardware and tools that match those constraints efficiently.

Start by asking the right questions: What problem are you solving? What latency can you tolerate? What data do you have – quality and quantity? What’s your power budget? What safety requirements apply? Only then look at hardware options.

The organisational challenge is real. Research shows that whilst AI investment is widespread, very few organisations consider themselves truly advanced in AI maturity. When applied correctly to well-defined problems, the results speak: predictive maintenance has demonstrated up to 40% cost reductions and 30-50% reliability improvements.

De-risk projects by bringing hardware, software, and firmware engineers together early, use pre-trained models as a starting point where they exist and simulate performance on different hardware options before committing to silicon. You don’t need to become a data scientist overnight. A disciplined engineering mindset, understanding constraints, trade-offs and verification, is exactly what’s needed to make embedded intelligence work.

If you start with the right questions about latency, bandwidth, power, safety, and data, the AI hardware choices become clearer. Embedded intelligence isn’t about chasing the biggest model; it’s about building systems you can trust, maintain, and ship.

About the author:

Derek Stewart, Business Development Engineer, Solsta

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Previous Post
DigiKey launches 2025 DigiWish holiday giveaway for global engineering community

DigiKey launches 2025 DigiWish holiday giveaway for global engineering community

Next Post
Storage trends & outlook 2026: navigating the next wave of data growth

Storage trends & outlook 2026: navigating the next wave of data growth