Espressif ESP-Skainet Voice Assistant Offers Wake Word Engine and Speech Commands Recognition for Embedded MCUs

Skynet is finally here! OK, not quite, but at least we do have ESP-Skainet now courtesy of Espressif Systems. ESP-Skainet is an intelligent voice assistant that features the company’s WakeNet wake word engine and MultiNet speech commands recognition.


WakeNet has been specifically designed for low-power MCUs such as ESP8266 or ESP32 with a low memory footprint (20KB RAM) and a high calculation speed that makes it capable of achieving a high success rate for wake word detection even in noisy environments.

ESP-Skainet WakeNet
WakeNet in LyraT Mini Board

Tested in the company’s upcoming LyraT-Mini audio board that combines an ESP32-WROVER-B module and a codec, WakeNet achieves a 97% wake word success rate at a one-meter distance, and 95% three meters away in a quiet environment.

ESP-Skainet wake-up engine ships with the wake-up word “嗨乐鑫” (Hi Lexin), which translates in “Hello Espressif”, and supports up to five wake words. You can use customize wake words as well, but you’d have to go through Espressif Systems to enable this customization.



Once your smart audio device (aka smart speaker) has been woken-up by the wake word it will listen to what you have to say using and convert the audio to text using MultiNet speech commands recognition which currently works in Chinese (Mandarin) only, but English support is coming soon, more exactly in the next release.

The company further explains how it works internally:

MultiNet’s design draws on Convolutional Recurrent Neural Networks (CRNN) and Connectionist Temporal Classification (CTC). MultiNet uses an audio clip’s Mel-Frequency Cepstral Coefficients (MFCC) as input, and the phonemes of that audio clip, which could be either in Chinese or in English, as output. By comparing the output’s phonemes, MultiNet can identify the relevant Chinese or English command.

At this stage up to 100 spoken commands in Chinese, including customized ones, are supported. Customizing voice commands do not require you to train the model again, and access to the network is not needed. Note that while WakeNet requires only 20KB RAM, ESP-Skainet (MultiNet) works on ESP8266 or ESP32 modules with at least 4MB SPI RAM.

More details may be found in the press release and Github repository.

Support CNX Software! Donate via PayPal or cryptocurrencies, become a Patron on Patreon, or buy review samples

Notify of
1 Comment
Jon Smirl
1 year ago

Who has the hardware to give this a whirl? Doesn ‘t look like the Olimex ESP32-ADF board is available in the US yet. Any other hardware options that I don’t have to wait three weeks on the shipping?