Home Products Design Low-cost development hack for voice codecs in Pi

Low-cost development hack for voice codecs in Pi

The modern world is going hands-free, as more people spend less time tapping keyboards and moving pointing devices, and increasingly more time using digital audio-visual communication tools. In addition, the trend for the Internet of Things (IoT) has led to a proliferation of connected devices that render digital information accessible in a more user-friendly format. Here, David Brooke has shared a low cost development hack for voice codecs in Pi.

Design

21 July 2020

byNews Desk

To succeed in the IoT ecosphere, the components inside these devices must be compact and low power, while still offering a high degree of accuracy and functionality. This trend is being aided by the development of small, robust and surface-mountable solid-state microphones constructed using Micro Electrical Mechanical System (MEMS) technology.

When compared against more traditional analogue microphone technologies, MEMS offers an entire gamut of new possibilities for voice-based applications. As a result, compact electronic devices are now emerging on the horizon.

For instance, an always-on MEMS microphone used in an intruder alarm is able to accurately detect the sound of glass breaking or the ambient noise level to detect disturbances. The microphones are also small and robust enough to be used in appliances, toys, wearables and other devices designed to offer HD-quality voice interaction. However, the development of voice-based devices is being held back somewhat by a long-standing dearth of high-quality voice digitisation technology.

The benefits of HATs

One of the drawbacks to designing any new device is the prohibitive cost of development, compounded further by shrinking time-to-market cycles. Not only do open-source projects such as the Raspberry Pi (RPi) play an important role for makers, hobbyists and educators, they have also become a vital way for professional engineers to reduce the time and cost of developing electronic products for the commercial market.

The RPi’s Linux-based operating system enables rapid development, testing and prototyping, enabling design teams to get to the proof-of-concept stage much more quickly and a significantly lower cost compared to the use of bespoke development boards and proprietary software. It also provides a good foundation for further development.

The RPi has always supported the attachment of other pieces of hardware, but since the release of the RPi Model B+ in 2014 developers have been able to add more functionality even easier, thanks to the Raspberry Pi Foundation putting a HAT on it. The acronym stands for Hardware Attached on Top and it provides common form factor that anyone can follow to create hardware that plugs directly into the RPi infrastructure (Figure 1).

Figure 1: Hardware attached on Top (HAT) add-on boards, such as the EV6550DHAT from CML pictured here, are an easy way to test and demonstrate the capabilities of a new design.

RPi HATs offer General Purpose Input/Output (GPIO) connectors that can be configured up to 40 ways, but the main benefit of these add-on boards is that, through the inclusion of a dedicated EEPROM device, the HAT is instantly recognised by the RPi board. The RPi then automatically configures the GPIO interface as well as any drivers on the HAT, making the process quick and simple in a ‘plug and play’ way.

The need for ultra-low power codecs

When converting voice audio from analogue to digital (ADC) or from digital back to analogue (DAC), an additional challenge is presented to designers; the need to find a suitable codec (COder-DECoder). Unlike codecs for multimedia applications, pure voice codecs have long been a somewhat neglected aspect of audio technology, and this means that many of them can’t offer the quality consumers have come to expect or are simply incompatible with modern MEMS microphones.

MEMS microphones usually produce one of two types of serial output protocols: Pulse Density Modulation (PDM) or Inter-IC Sound (I2S). An internal clock is used to synchronise the digital signal that encodes the analogue input. The initial conversion uses a combination of pre-amplification with PDM modulation within the MEMS microphone itself, the output of which is then processed by the codec.

The codec converts the PDM bitstream into framed data using decimation filters to recover the low-frequency data from the high-frequency PDM signal. This step can be done with a digital signal processor (DSP), but there are several drawbacks: DSPs can be relatively expensive, their software algorithms can be complicated and because they are processors running at relatively high clock speeds the overall power required to perform the task can be prohibitively high for a low-cost, low-power consumer device.

By bringing the decimation filters into the hardware of the codec, a much simpler, cheaper and lower-power solution can be created. This is exactly the approach CML Microcircuits has taken with the CMX655D.

Figure 2: simplified block diagram of the CMX655D voice codec

The CMX655D (Figure 2) from CML Microcircuits is an ultra-low power voice codec specially designed to support the latest MEMS microphone technology for always-on digital voice applications, even in battery-powered devices.

Consuming less than 1mA in record mode and in listening mode just 400µA, the codec can perform efficiently under all operating conditions. The chip supports the bandwidths commonly used for both conventional telephony (300Hz to 3.4kHz) and HD voice (50Hz to 7kHz), as well as a full audio band 50Hz – 20kHz mode.

In addition, the device supports noise cancellation applications by interfacing simultaneously with two microphones and keeping phase matching between the two paths identical throughout the device. The CMX655D package also includes an integrated high-efficiency Class D speaker driver, providing an output of up to 1W. This functionality is generally not available with other devices, including general purpose DSPs, and so would typically require an additional external IC.

Enabling shorter development cycles

With all the flexibility of the RPi HATs in mind, CML decided to create an evaluation board for the CMX655D codec solution in HAT form, providng the Raspberry Pi community easy access to this new technology. The EV6550DHAT provides easy access to all the features and benefits of the codec. It is suitable for anyone, from hobbyists and makers to professional developers and product manufacturers alike.

The EV6550DHAT follows the Raspberry Pi Foundation’s common HAT format (Figure 3), enabling it to be connected via the 40-pin GPIO extended connector to the top of an RPi Model B+ with no modifications.

Figure 3: the EV6550DHAT GPIO signal path routing diagram

The HAT can be completely powered by the Raspberry Pi up to a total of 50mA, but it also supports the use of an alternative 3V3 supply through the RPI GPIO 3.3V pins, which is fed via a decoupling inductor to reduce the digital noise from the RPi supply. The board design also supports an external power source via links and pads if other HATs or peripherals are simultaneously being used, which would subsequently limit the total power available from the host USB interface.

The add-on board features two top-ported digital MEMS microphones, an easy screw connection for an external speaker, accessible GPIO and test pads to accommodate additional functionality. The simple GUI with open-source drivers allows the user to control, demonstrate and also assess the functionality of the audio conversion. This creates a platform that is convenient and inexpensive to use, further simplifying product development.

The GUI offers engineers the ability to record and playback .wav files, as well as access to pre-recorded sample files which can be played back at adjustable rates and levels to provide easy comparisons between settings. These features also help engineers understand how the settings influence audio quality. The sample rates are configurable between 8, 16, 32 and 48ksamples/s with adjustments to input audio gain from -12dB to +3dB. Playback volume can be set between -90dB to 0dB, and muting and ‘smoothing’ functions are also available.

Conclusion

The Raspberry Pi has become much more than a way to promote technology to a new and younger audience; it offers even seasoned engineers greater access to the very latest integrated solutions. Its robust design and excellent community support mean it is now also attractive to semiconductor companies like CML.

Through the RPi, the EV6550DHAT provides a low cost way to evaluate the highly versatile, ultra-low power CMX655D single-chip voice codec solution. This will give OEMs the ability to more easily develop a simplified but high-performance signal chain for a wide range of voice-activated and voice-interaction products, that can be in an always-on or permanent standby mode without drawing excessive power.

And by making the solution easily modifiable and accessible to the engineering community at large, it is not hard to imagine that this much needed voice codec will become an integral part of some of the most innovative voice-based devices of the near future.