I’ve already been experimenting with DIY smart speakers and corresponding services for example using ReSpeaker board with Microsoft Bing Speech API, or Orange Pi Zero with Google Assistant SDK. But so far all the hardware platforms I used only came with one microphone, no microphone array that help with wake word detection in noisy environments.
Last week-end, I received Espressif Audio Mic HDK, an ESP32 board with a 3-microphone array which I’ll review a in a few weeks once documentation becomes available and I clear some other items in my review list. In the meantime, I checked out the hardware, and found out the mainboard also comes with Microsemi ZL38063 audio processor specifically designed for microphone array. The chip was released las year, and can be already found in the company’s AcuEdge Development Kit for Amazon AVS, but since I’m going to use a board based on ZL38063 I’d thought to have a closer look to better understand its capabilities.
- DSP with voice hardware accelerators
- Microphone Array Configuration
- 2-3 microphones linear array for 180° audio pick up
- 3 microphone off-axis or triangular arrays for 360° audio pick up
- Device cascading allows up to a 6 microphones array for both 180 & 360° audio pick up
- Supports 360o sound location estimation
- Full Narrowband and Wideband Acoustic Echo Cancellation (AEC) operation
- Noise Reduction
- Far field microphone processing
- Two way communication
- ASR (Automatic Speech Recognition) Assist Algorithms
- Support microphone configuration listed above
- Support for Barge-in, or incoming trigger ‘spotting’ in the presence of DAC audio output
- Enhanced far-field support for distances up to 16 feet (~5 meters) from the microphone.
- Host interface – I2C or SPI
- Debugging – Via UART interface
- Storage – SPI flash
- Expansion – 14 GPIOs
- Power Modes – 2 low power modes controlled by reset
- ZL38063LDF1 – 64-pin QFN (9×9) package (Tape and Reel)
- ZL38063LDG1 – 64-pin QFN (9×9) package (Tray)
- ZL38063UGB2 – 56-ball WLCSP (3.1×3.1) package (Tape and Reel)
So basically, the chip process audio from multiple microphone (2 to 6), and detects voice commands. The
The block diagram also reveals the chip can be connected to speakers, and interface with the host processor over I2S/PCM to send PCM audio, and I2C or SPI for commands. It also supports an optional external SPI flash.
The chip supports Microsemi AcuEdge technology, which consists of license-free, royalty-free intelligent audio IP algorithms, and firmware can be upgraded to support specific mode of operations. The company of provides a reference design, (ZLS38508) MiTuner GUI software to interactively configure ZL38063 device with the following options:
- Auto Tuning and Subjective Tuning support
- Provides visual representations of the audio paths with drop-down menus to program parameters, allowing:
- Control of the audio routing configuration
- Programming of key blocks in the transmit (Tx) and receive (Rx) audio paths
- Setting analog and digital gains
- Configuration parameters allow users to “fine tune” the overall performance
Applications for such chip not only include smart speakers, but also smart home gateways, set-top boxes, digital assistants, basically anything that may require voice commands detection. You’ll find documentation and software on the product page, but most of it is only accessible after registration. It’s free, but you need a company or university email address, and after email confirmation wait for the company to accept your application.
Back to Espressif Audio Mic HDK. Espressif claims keyword recognition runs on the Espressif ESP32 module, so that means the MicroSemi ZL38063 chip does not perform the wake word detection by itself, but instead assists the host processor by processing the audio with specific algorithms that make detection more accurate on the host processor.