PCMFlow722 library enables two-way real-time HD voice over ESP-NOW with G.722 audio codec

Tanaka Masayuki’s PCMFlow722 library enables (half-duplex) two-way real-time HD voice over ESP-NOW on ESP32 boards with a speaker and a microphone, effectively transforming them into walkie-talkies.

The library implements a G.722 wideband codec add-on for PCMFlow lightweight audio decode and PCM flow library for Arduino, which already supports uncompressed PCM, MP3, and FLAC audio codecs. PCM and FLAC take too much bandwidth over ESP-NOW, and MP3 is not suitable for real-time audio, so the legacy G.722 audio codec was selected instead.

ESP-NOW two-way HD audio communication

The keyword here is “HD voice,” since two-way audio over ESP-NOW was previously implemented in projects such as Atomic14’s esp32-walkie-talkie (5 years ago) and, more recently, the well-documented Adafruit ESP-NOW Walkie-Talkie project, but these typically rely on lower-quality G.711 audio or compressed audio.

The PCMFlowG722 library and G.722 codec enable HD voice with “7 kHz audio at 16 kHz sampling using the same 64 kbps wire budget as G.711 — same packet size, twice the audio bandwidth”, as explained by Tanaka. The table below compares G.711, G.722, and Opus codecs and libraries.

PCMFlowG711PCMFlowG722 (this lib)PCMFlowOpus
Audio bandnarrowband (8 kHz / ≤ 3.4 kHz)wideband (16 kHz / ≤ 7 kHz)narrow / wide / fullband (8–48 kHz)
Bitrate (voice)64 kbps fixed64 kbps fixed (Mode 1)16–32 kbps typical
Compression vs raw 16-bit PCM2×4×10–15×
Codec flash footprint< 4 KB~10 KB~150–180 KB
Codec CPUnegligiblelownon-trivial on M0/M3-class MCUs
Patent / license complexitynone (1972 standard, expired)none (1988 standard, expired); core is Public Domainroyalty-free patent grant, BSD-3-Clause source
Qualitytoll-grade telephonyHD voice (wideband telephony)wideband / fullband, far better

Opus offers lower bandwidth and full-band audio quality, but the G.722 audio codec is less complex and requires fewer resources (CPU, storage).  It’s also well suited to ESP-NOW, since the protocol carries up to 250-byte payloads, and a 20-ms G.722 voice frame at 16 kHz produces 160 bytes, the same as G.711, but with twice the audio bandwidth, while raw 16 kHz mono 16-bit PCM would require 640 bytes (G.722 compresses four times).

EspNowTransceiver

The library implements a G.722 encoder compressing 16 kHz PCM into G.722, and a decoder doing the reverse. If you want to try it on your own board, the EspNowTransceiver.ino Arduino sketch is probably what you want to try. It’s a half-duplex HD-voice transceiver over ESP-NOW, and the same firmware acts as both sender and receiver.

It’s been tested on the M5Stack Core2 boards with an ESP32 SoC, 8MB PSRAM, 16MB SPI flash, a small 2-inch display showing the EPS-NOW channel, a 1W speaker (1W-0928), and an SPM4123 microphone. While button A is held, the audio is broadcast over ESP-NOW to one or more Core2 devices, and in all other cases, the devices are set as audio receivers.

Via Adafruit

Share this:

Support CNX Software! Donate via cryptocurrencies, become a Patron on Patreon, or purchase goods on Amazon or Aliexpress. We also use affiliate links in articles to earn commissions if you make a purchase after clicking on those links.

Radxa Orion O6 Armv9 mini-ITX motherboard
Subscribe
Notify of
guest
The comment form collects your name, email and content to allow us keep track of the comments placed on the website. Please read and accept our website Terms and Privacy Policy to post a comment.
11 Comments
oldest
newest
Boardcon MINI1126B-P AI vision system-on-module wit Rockchip RV1126B-P SoC