Espressif has just introduced the ESP Private Agents platform to help developers build local, private, and customizable AI assistants for ESP32 devices running on-device, although they can also support hybrid AI workloads with a mix of on-device and cloud processing.
The ESP Private Agents platform offers a unified framework that allows developers to build applications combining speed, vision, automation, and agent-based interactions, for example, a multi-lingual, on-device voice agent (aka smart speaker) or task-oriented agents that automate workflows.
The solution is built on AWS cloud services using AWS Fargate as a primary application platform and Amazon Bedrock Foundation Models as backend LLM systems. It not only works with ESP32-power devices with speaker and microphone, but also with mobile apps and web clients.

Espressif released a Web-based demo, which you can use as a text-based chatbot or as a voice assistant leveraging the speaker and microphone on your computer. The company said that for production cases, customers can deploy the solution to their own AWS account. I gave it a try in Firefox on Ubuntu 24.04. After logging in through ESP Rainmaker, I was able to use the chatbot just fine.
When I clicked on the microphone to switch audio one, the assistant repeated the answers in voice form, but it was unable to hear despite detecting my microphone. I could click on the microphone, talk, and press the stop button to send the audio, but nothing happened. The microphone is working on my computer, but maybe ESP Private Agents don’t play nice with Firefox.
Nevertheless, here’s a much more interesting demo using the EchoEar hardware as a multi-lingual AI voice assistant speaking in English, Hindi (TBC), German, and Spanish as different speakers take turns.
The announcement on Espressif’s developer blog explains in more detail the steps required to create your own AI agent and related hardware. Here’s a summary.
Creating an AI agent:
- LLM Selection from a range of supported AWS Bedrock Foundation Models, each with its own performance, cost, and behavior.
- System Prompt – It defines the agent’s behavior and establishes its persona, such as a voice controller, storyteller, or customer support assistant.
- Tools – These are pluggable actions that an agent can invoke to perform specific tasks, for example, ESP RainMaker control, Volume Control, and Emotion Detection. Two types of tools are available:
- Remote Tools compatible with the Model Context Protocol (MCP)
- Local Tools executed directly on the client, such as the IoT device itself or a companion mobile application. One example would be turning a light on or adjusting a cooling fan speed.
Once an agent is defined, you can test it directly from the web dashboard. Once you are satisfied with the results, you can carry on with development on real hardware using one of the three supported development kits: EchoEar, ESP32-S3-Box, or M5Stack CoreS3, and perform the following steps:
- Program the firmware – The solution will generate source code and binary for the firmware, which you can flash from your web browser. Two types of firmware are available for now: Generic Assistant or voice-controlled Matter controller with Thread support. More details can be found on GitHub.
- Provision the device using the ESP RainMaker Home app
- Configure a new Agent into the Device – Optional. This is to change the default Agent running on the device using a QR code
- Interact with the device using voice
Visit agents.espressif.com to get started.

Jean-Luc started CNX Software in 2010 as a part-time endeavor, before quitting his job as a software engineering manager, and starting to write daily news, and reviews full time later in 2011.
Support CNX Software! Donate via cryptocurrencies, become a Patron on Patreon, or purchase goods on Amazon or Aliexpress. We also use affiliate links in articles to earn commissions if you make a purchase after clicking on those links.









Better to use Home Assistant for local voice control https://www.home-assistant.io/voice_control/
https://www.home-assistant.io/voice_control/voice_remote_local_assistant/
You have those options for local control, but they really aren’t very good or require you to keep a large PC running. As of today there is no practical, good solutions for local control.
This, https://www.firefly.store/products/rk182x-3d-ram-stacking-development-kit is a good solution to doing local speech processing. However, I don’t understand why this board is $809.
Before RAM prices went crazy you could buy an AMD AI MAX 395 PRO box for $995 and that’s a way better choice that the $809 Firefly board. But with crazy RAM prices those have jumped to $1500-2000.
So the real question is, what is an RK1820 chip going to cost? I am hoping it is under $50. If that is the case local speech processing becomes viable. Does anyone have a clue on pricing for this chip?
There’s some discussion about it at https://www.cnx-software.com/2025/07/18/rockchip-unveils-rk3668-10-core-arm-cortex-a730-cortex-a530-soc-with-16-tops-npu-rk182x-llm-vlm-co-processor/#comment-656198
The RK1820/RK1828 will mostly be useful for LLMs, as computer vision performance is lower than on the RK3588, based on the benchmarks shared there.
Pricewise, the AIO-GS1N2 is not new, and the company sells the equivalent model for $779 with the Jetson Orin Nano 8GB (and no PoE). Since the RK182x devkit is not in stock, I’m not sure the listed prices are relevant. Considering the Jetson Orin Nano 8GB is sold for $249 (on Arrow), $1,029 for the RK1828 and $889 for the RK1820 devkits, would mean something like $359 and $499 for the respective Rockchip modules, which doesn’t seem realistic…
Commenting on the actual ESP Private Agent, this is a better choice for privacy than using the Home Assistant Cloud option. With ESP Private Agent you can set up your own private processing at AWS instead of using Home Assistant Cloud’s product.
Private cloud? YMMD! xD