Hiwonder has introduced the WonderLLM, an ESP32-S3-based smart chat module that combines a 2MP camera, a 2.0-inch touch display, a speaker, and a microphone array to support both offline computer vision tasks and cloud-based Large Language Models (LLMs) via the XiaoZhi AI platform.
The device ships with a dedicated voice chip (CI1302) that enables always-on wake-word detection, and a 4-pin I2C interface allows it to act as a smart vision/voice sensor for external controllers like Arduino, STM32, or other ESP32 boards. Typical applications include small robots, STEM education kits, interactive assistants, and vision-enabled projects where this device handles perception and interaction, and another MCU manages motion and control logic.
Hiwonder WonderLLM specifications:
- Wireless Module – ESP32-S3-WROOM-1
- SoC – Espressif Systems ESP32-S3
- CPU – Dual-core Tensilica LX7 up to 240 MHz with vector extension for AI/ML workloads
- RAM – 512KB SRAM
- Storage – TBD
- Wireless – WiFi 4 and Bluetooth LE 5
- Antenna – PCB antenna
- SoC – Espressif Systems ESP32-S3
- Display – 2.0-inch LCD screen
- Camera – 2MP fixed-focus camera with 123° Field of View (FOV)
- Audio
- Built-in high-fidelity microphone
- Built-in speaker for voice feedback
- Dedicated Voice Chip (CI1302) for low-power wake-word detection
- USB – 2x USB Type-C ports
- Top port – ESP32-S3 firmware flashing and power
- Bottom port – Voice chip (CI1302) firmware flashing
- Expansion – 4-pin I2C connector (HY2.0-4P style) for external MCU communication
- Misc
- Mode switch button
- Wake-up button
- M3 mounting holes (34mm x 50mm spacing on back)
- Power – 5V DC via USB-C
- Dimensions – 60 x 54 x 22 mm
- Weight – 46 grams


The LLM in the name suggests that it does local processing, but the ESP32-S3 obviously doesn’t have the RAM to run models like Qwen or DeepSeek locally. Instead, the module acts as an intelligent client, handling wake-words and audio processing locally before offloading queries to the cloud. However, it does support offline AI/computer vision workloads like face detection, color tracking, and line following directly on the device.
In terms of software support, WonderLLM comes with both factory and customizable firmware for the CI1302 speech chip and the ESP32-S3 controller, along with various tools for flashing, debugging, and further development. It supports offline speech recognition, offline vision processing, and cloud-based large model features, using Arduino for vision tasks, I²C for hardware control using ESP32, Arduino, STM32, and BBC micro:bit, and JSON for Model Context Protocol (MCP) integration. Detailed documentation (in source/docs), firmware images, protocol lists, utilities, and example projects are available on the relevant GitHub repository.

In terms of looks and hardware specs its very similar to the M5Stack CoreS3 module, but it removes several of the CoreS3’s features like the 6-axis IMU (gyroscope/accelerometer), magnetometer, and proximity sensors. This device follows a trend of ESP32-based smart AI devices similar to Espressif’s own EchoEar voice assistant development kit and the ESP Private Agents platform.
The WonderLLM AI Vision Module is available now on the Hiwonder online store for $29.99. There is also a bundle option with mounting brackets and a USB cable that sells for $35.99. It’s also available on AliExpress, but at a ridiculously high price tag of $88.45, and the bundle option costs $93.07.

Debashis Das is a technical content writer and embedded engineer with over five years of experience in the industry. With expertise in Embedded C, PCB Design, and SEO optimization, he effectively blends difficult technical topics with clear communication
Support CNX Software! Donate via cryptocurrencies, become a Patron on Patreon, or purchase goods on Amazon or Aliexpress. We also use affiliate links in articles to earn commissions if you make a purchase after clicking on those links.





These only connect to a Chinese chatbot via a login on a web portal, so its not a very convenient way to actually use one for daily intertaction.
Is the hardware supported by esphome? In that case local voice via home assistant could be the way without other people computers (AKA cloud)