Hand Gesture Recognition on ESP32-S3 with the ESP-DL library

Ali Hassan Shah has deployed a deep learning model for hand gesture recognition on the ESP32-S3-EYE  board using the ESP-DL library and achieved AI-powered hand recognition with a 0.7-second latency on the ESP32-S3 camera board.

Last year, Espressif released the ESP-DL library for the ESP32-S3 microcontroller with AI vector extensions, as well as ESP32 and ESP32-S2, along with a face detection demo that ran much faster on the ESP32-S3. Ali rolled out his own solution for AI gesture recognition and provided a step-by-step tutorial along the way.

ESP32-S3 AI Gesture Recognition

The main steps to deploying a custom model with the ESP-DL library include:

  • Model Development that involves
    • Getting or creating datasets. In this case, downloaded from Kaggle with 6 gestures namely Palm, I, Thumb, Index, Ok, and C.
    • Testing, training, and calibrating the datasets
    • Building a (CNN) Model
    • Training a Model
    • Saving a Model to the Hierarchical Data format (.h5)
    • Converting the H.5 model to the ONNX format for compatibility with the ESP-DL library
  • ESP-DL format – One more conversion from ONNX to ESP-DL format using the PyCharm IDE to run the Python optimizer provided by the ESP-DL library
  • Model Deployment steps
    • Create a new project in VS-Code based on the ESP-IDF framework
    • Model definition – Import libraries from model, declare, initialize, build, and call layers
  • Model Run steps
    • Import the required libraries
    • Declare the input
    • Set input shape
    • Call a model
    • Monitor input

While it’s great to be able to run AI workloads on relatively low-end hardware like the ESP32-S3 MCU, it’s not exactly easy and requires some work. I suppose that’s why solutions such as Edge Impulse were created, although I did not find it particularly straightforward when I tried it with the Xiao BLE Sense board. But in hindsight, it looks much easier than deploying a custom model with the ESP-DL library. If you’d just like to try out Ali’s gesture recognition demo on ESP32-S3-EYE board, it’s much easier, and the resources and instructions can be found on GitHub.

David Jashi
1 month ago

OK, one application I can think of is a light switch, but then you will have to remove IR filter from camera and add IR lights, I guess.

1 month ago

Is that not to complicated a solution for a light switch, in a world trying to reduce electric use and pollution from electric generation. As well as reduce e-waste?

Houses X light switches = lots

