Espressif ESP-SR is a speech recognition framework enabling on-device speech recognition on ESP32 and ESP32-S3 wireless microcontrollers with the latter being recommended due to its vector extension for AI acceleration and larger, high-speech octal SPI PSRAM.
The ESP-SR framework was first released on December 17, 2021 with version 1.0, before the v1.20 update was introduced in March of this year, but I only found out about ESP-SR offline speech recognition solution through a tweet by John Lee showing an ESP-SR demo video by @ThatProject.
Comrades of the world, liberate your hands from the chains of typing and touching germy switches! Embrace the revolutionary power of speech recognition with ESP32-S3 + ESP-SR. Let your words flow freely, for the proletariat shall not be silenced by keyboards or bourgeois input… pic.twitter.com/bm3udteB3o
— John Lee (@EspressifSystem) July 15, 2023
I initially was confused since ESP32 boards have supported speech recognition for years using the ESP-ADF framework. But the key difference is that the latter relies on online voice assistants such as Baidu DuerOS, Amazon Alexa, and Google Assistant, while the relatively new ESP-SR does that locally directly on the ESP32 CPU, so you don’t even need a network connection for this to work. We’ve written about various offline voice recognition modules in the last few years, and I didn’t know this was already implemented on the ESP32 chips.
The GitHub repository for ESP-SR lists four main components:
- Audio Front-end AFE
- WakeNet Wake Word Engine
- MultiNet Speech Command Word Recognition
- Speech Synthesis (only supports the Chinese language at this time)
If some of the components above ring a bell, that’s because they are existing solutions and we covered the ESP-AFE algorithms when they become Alexa certified, while WakeNet and MultiNet are part of the ESP-SKAINET assistant introduced in 2019. What appears to be new are test apps for speech recognition and text-to-speech conversion that were committed just 3 to 5 days ago.
So it looks like the ESP-SR simply combines all those different projects as components to help with integration into customers’ projects. You’ll find documentation on the Espressif website, and the company recommends the ESP32-S3-Korvo-1 or ESP32-S3-Korvo-2 development boards to get started although I’d assume it should probably work on other ESP32-S3 smart audio devkits with microphones such as the ESP32-S3-BOX as well.
Jean-Luc started CNX Software in 2010 as a part-time endeavor, before quitting his job as a software engineering manager, and starting to write daily news, and reviews full time later in 2011.