Home Products Artificial Intelligence This Edge AI processor tackles the monstrous compute demands of generative AI

This Edge AI processor tackles the monstrous compute demands of generative AI

Kinara, Inc. has announced the launch of its Kinara Ara-2 Edge AI processor, a new product designed to enhance Edge servers and laptops.

Artificial Intelligence

12 December 2023

byNews Desk

The Ara-2 processor provides high performance, cost-effectiveness, and energy efficiency, making it ideal for running applications such as video analytics, Large Language Models (LLMs), and other generative AI models. It is also well-suited for Edge applications that use both traditional AI models and state-of-the-art AI models with transformer-based architectures.

The Ara-2 offers an advanced feature set and delivers performance that is 5-8 times greater than its predecessor, the Ara-1 processor. It combines real-time responsiveness with high throughput, balancing on-chip memories and high off-chip bandwidth for executing large models with low latency.

The demand for LLMs and generative AI has grown, but most applications currently rely on GPUs in data centres, leading to high latency, high costs, and privacy concerns. Kinara’s Ara-2 aims to address these issues by bringing the computing power to the user’s Edge. It supports the billions of parameters used by generative AI models and facilitates the migration from GPUs with its compute engines and software development kit (SDK), which offer high-accuracy quantisation and direct FP32 support.

Ravi Annavajjhala, CEO of Kinara, spoke to Electronic Specifier about the new product: “With Ara-2 added to our family of processors, we can better provide customers with performance and cost options to meet their requirements. For example, Ara-1 is the right solution for smart cameras as well as Edge AI appliances with 2-8 video streams, whereas Ara-2 is strongly suited for handling 16-32+ video streams fed into edge servers, as well as laptops, and even high-end cameras. The Ara-2 enables better object detection, recognition, and tracking by using its advanced compute engines to process higher resolution images more quickly and with significantly higher accuracy. And as an example of its capabilities for processing Generative AI models, Ara-2 can hit roughly 0.5 seconds per iteration for Stable Diffusion and tens of tokens/sec for LLaMA-7B.

“Regarding the generative AI front, we don’t see any competing solutions out there. There are a number of AI accelerators in the market, but none have announced capabilities like stable diffusion, for example. We believe our performance in stable diffusion is the best in its class. The generative AI front showcases the strength of our software stack and compiler. We’re able to efficiently run language models and later on, diffusion models, on our platform. Clearly, we stand head and shoulders above the rest in terms of performance per watt and dollar, which translates to cost-efficient and power-efficient performance.”

In October, Ampere welcomed Kinara into the AI Platform Alliance to reduce system complexity and promote collaboration and openness in AI solutions.

Sean Varley, Ampere’s Chief Evangelist, said: “The performance and feature set of Kinara’s Ara-2 is a step in the right direction to help us bring better AI alternatives to the industry than the GPU-based status quo.”

The Ara-2 also emphasiSes security with features like secure boot, encrypted memory access, and a secure host interface. Kinara supports the Ara-2 with a comprehensive SDK, which includes a model compiler, compute-unit scheduler, flexible quantisation options, a load balancer for multi-chip systems, and a dynamically moderated host runtime.

Ara-2 is available in various forms, including a stand-alone device, a USB module, an M.2 module, and a PCIe card featuring multiple Ara-2’s. Kinara will demonstrate Ara-2 at CES, and invites interested parties to schedule an appointment in their suite at the Venetian Hotel on January 9, 10, or 11, 2024.