Hand Gesture Recognition on ESP32-S3 with the ESP-DL library

Ali Hassan Shah has deployed a deep learning model for hand gesture recognition on the ESP32-S3-EYE  board using the ESP-DL library and achieved AI-powered hand recognition with a 0.7-second latency on the ESP32-S3 camera board.

Last year, Espressif released the ESP-DL library for the ESP32-S3 microcontroller with AI vector extensions, as well as ESP32 and ESP32-S2, along with a face detection demo that ran much faster on the ESP32-S3. Ali rolled out his own solution for AI gesture recognition and provided a step-by-step tutorial along the way.

ESP32-S3 AI Gesture Recognition

The main steps to deploying a custom model with the ESP-DL library include:

  • Model Development that involves
    • Getting or creating datasets. In this case, downloaded from Kaggle with 6 gestures namely Palm, I, Thumb, Index, Ok, and C.
    • Testing, training, and calibrating the datasets
    • Building a (CNN) Model
    • Training a Model
    • Saving a Model to the Hierarchical Data format (.h5)
    • Converting the H.5 model to the ONNX format for compatibility with the ESP-DL library
  • ESP-DL format – One more conversion from ONNX to ESP-DL format using the PyCharm IDE to run the Python optimizer provided by the ESP-DL library
  • Model Deployment steps
    • Create a new project in VS-Code based on the ESP-IDF framework
    • Model definition – Import libraries from model, declare, initialize, build, and call layers
  • Model Run steps
    • Import the required libraries
    • Declare the input
    • Set input shape
    • Call a model
    • Monitor input

While it’s great to be able to run AI workloads on relatively low-end hardware like the ESP32-S3 MCU, it’s not exactly easy and requires some work. I suppose that’s why solutions such as Edge Impulse were created, although I did not find it particularly straightforward when I tried it with the Xiao BLE Sense board. But in hindsight, it looks much easier than deploying a custom model with the ESP-DL library. If you’d just like to try out Ali’s gesture recognition demo on ESP32-S3-EYE board, it’s much easier, and the resources and instructions can be found on GitHub.

Share this:

Support CNX Software! Donate via cryptocurrencies or become a Patron on Patreon

ROCK Pi 4C Plus
Notify of
The comment form collects your name, email and content to allow us keep track of the comments placed on the website. Please read and accept our website Terms and Privacy Policy to post a comment.
David Jashi
1 year ago

OK, one application I can think of is a light switch, but then you will have to remove IR filter from camera and add IR lights, I guess.

1 year ago

Is that not to complicated a solution for a light switch, in a world trying to reduce electric use and pollution from electric generation. As well as reduce e-waste?

Houses X light switches = lots

Khadas VIM4 SBC