A First Look at ESP32-LyraTD-MSC Audio Mic HDK with Baidu DuerOS Assistant

Earlier this year, Espressif Systems had unveiled their ESP32-LyraTD-MSC Audio MiC HDK (Hardware Development Kit) which features an ESP32-WROOM module, a 4-mic array DSP, 3 microphones, an audio jack, and various I/Os.

I received the board a couple of weeks ago, and while there’s no public information released yet, the company provided me with ESP32-LyraTD-MSC User Guide in English. Eventually, I’d expect Google Assistant and Amazon Alexa to be supported, but in the meantime I had to leverage my (lowly) Chinese language skills to get started since the kit is pre-loaded with firmware connecting to Baidu DuerOS voice assistant.

ESP32-LyraTD-MSC Unboxing

The kit came in a bland Espressif Systems carton box.

Inside the package, I could only find one kit comprised of two boards.

The bottom board read ESP32_MicrosemiDSP_Mainboard-V1, and does not show much apart from marking for connectors, headers and the power switch.

While the top comes with eight buttons (Vol +, Vol -, Mode, Boot, RST, Rec, Play, and Set), three microphones, as well as some configuration switches, which you may not want to touch a first…

We can take the two boards apart to check out the mainboard, and ESP32_MicrosemiDSP_SubBoard_V1 with the microphones and buttons which includes a chip marked “N1309-3216”.

If we have a closer look at the main board, we’ll find ESP32-WROVER module, MicroSemi ZL38063 audio processor which will process the audio from the microphones, and assist ESP32 with wake word recognition, as well as a CP2102N chip for debugging. We also have a micro SD card slot, two micro USB port (one for power, one for UART), an audio jack to connect a speaker, an on/off switch, and various headers for I/O and debugging (e.g. JTAG).

Testing Espressif Systems ESP32 Audio Mic HDK with Baidu DuerOS

As this stage there’s actually little you can do due to the lack of documentation, but I was still able to test the hardware with Baidu DuerOS assistant. The first part of the user manual tells you to flash the firmware, but the requested files are nowhere to be found, and luckily the board was pre-loaded with some version of it.

So what I had to do first is to connect a USB power supply to the POWER micro USB port as well as a pair of speakers. If you plan to modify and flash the firmware (once it becomes available) you’ll also need to connect a micro USB to USB cable between your (Windows) computer and the UART micro USB port.

Now change the power switch to ON, and for the first boot, you should see the blue LED blink. Press the SET button for a few seconds until the board utters something in Chinese (which I could not understand), and install & run IOT Espressif for Android (apk) or ESP-TOUCH for iOS on your smartphone. Skip all the initial steps, and tap on the top left icon, select Add devices, input your WiFi password, and click OK.

After a few seconds, you should see one item added to the “Connected to WiFi Device List”, meaning the kit is now a client on your WiFi network. The blue LED should now be on at all times (no blinking).

Now we can try the voice assistant with “Alexa” wake word, which will cause the board to reply “ “ (nin hao! you shenme fenfu) which translates to “Hello! How can I help you?”. We can then repeat “Alexa” with our request in Chinese. I tried to ask for the time, nd weather, and play music in the video below.

The assistant combines female and kid voices for interaction. I actually added one MP3 and one FLAC audio files in the micro SD card hoping it would start playing them, but instead it started some music from then net.

Microsemi ZL38063 Documentation & Tools

That’s all I could do for now, as we’ll need to get more documentation and some source code from Espressif Systems to further experiment with the platform. Although not compulsory, you may also be interested in ZL38063 audio processor resources since it interfaces with ESP32 over SPI for commands and I2S for audio. It may be necessary to change the wake word for example, although Espressif Systems mentioned they could do that themselves, and they’d just need 5,000 audio samples of the wake/hot word. Most of documentation and software tools are not public, so you’d need to request access to those with a company email address.

To my surprise, I managed to access the files using my website address, but sadly can’t share anything since none of the files are publicly available. The process is somewhat cumbersome, as you need to get approval for the account first which takes a few days, then request access to documentation for another day or two. There’s a separate login for software and registration to “Microsemi Software Delivery System (SDS)” is automatic, but again you need to request access to each software/firmware package individually which in my case was accepted within 24 hours. It would be good if Espressif Systems and/or Microsemi themselves could make it easier for developers to access those resources for a processor that was released in 2015. Some documentation for ZL38063 based Microsemi AcuEdge Development Kit for Amazon AVS (ZLK38AVS) can be found on Github, but I’m not sure whether much of it is usable for the Espressif development kit.

Espressif Audio Mic HDK is not for sale just yet, but the company has sent the kit to several developers, so we should except some progress in the weeks or months ahead. I’ll likely check it out again once on English voice assistant is made to work, and more resources are made public.

Jean-Luc Aufranc (CNXSoft)

Jean-Luc started CNX Software in 2010 as a part-time endeavor, before quitting his job as a software engineering manager, and starting to write daily news, and reviews full time later in 2011.