Tanaka Masayuki’s PCMFlow722 library enables (half-duplex) two-way real-time HD voice over ESP-NOW on ESP32 boards with a speaker and a microphone, effectively transforming them into walkie-talkies.
The library implements a G.722 wideband codec add-on for PCMFlow lightweight audio decode and PCM flow library for Arduino, which already supports uncompressed PCM, MP3, and FLAC audio codecs. PCM and FLAC take too much bandwidth over ESP-NOW, and MP3 is not suitable for real-time audio, so the legacy G.722 audio codec was selected instead.
The keyword here is “HD voice,” since two-way audio over ESP-NOW was previously implemented in projects such as Atomic14’s esp32-walkie-talkie (5 years ago) and, more recently, the well-documented Adafruit ESP-NOW Walkie-Talkie project, but these typically rely on lower-quality G.711 audio or compressed audio.
The PCMFlowG722 library and G.722 codec enable HD voice with “7 kHz audio at 16 kHz sampling using the same 64 kbps wire budget as G.711 — same packet size, twice the audio bandwidth”, as explained by Tanaka. The table below compares G.711, G.722, and Opus codecs and libraries.
| PCMFlowG711 | PCMFlowG722 (this lib) | PCMFlowOpus | |
|---|---|---|---|
| Audio band | narrowband (8 kHz / ≤ 3.4 kHz) | wideband (16 kHz / ≤ 7 kHz) | narrow / wide / fullband (8–48 kHz) |
| Bitrate (voice) | 64 kbps fixed | 64 kbps fixed (Mode 1) | 16–32 kbps typical |
| Compression vs raw 16-bit PCM | 2× | 4× | 10–15× |
| Codec flash footprint | < 4 KB | ~10 KB | ~150–180 KB |
| Codec CPU | negligible | low | non-trivial on M0/M3-class MCUs |
| Patent / license complexity | none (1972 standard, expired) | none (1988 standard, expired); core is Public Domain | royalty-free patent grant, BSD-3-Clause source |
| Quality | toll-grade telephony | HD voice (wideband telephony) | wideband / fullband, far better |
Opus offers lower bandwidth and full-band audio quality, but the G.722 audio codec is less complex and requires fewer resources (CPU, storage). It’s also well suited to ESP-NOW, since the protocol carries up to 250-byte payloads, and a 20-ms G.722 voice frame at 16 kHz produces 160 bytes, the same as G.711, but with twice the audio bandwidth, while raw 16 kHz mono 16-bit PCM would require 640 bytes (G.722 compresses four times).
The library implements a G.722 encoder compressing 16 kHz PCM into G.722, and a decoder doing the reverse. If you want to try it on your own board, the EspNowTransceiver.ino Arduino sketch is probably what you want to try. It’s a half-duplex HD-voice transceiver over ESP-NOW, and the same firmware acts as both sender and receiver.
It’s been tested on the M5Stack Core2 boards with an ESP32 SoC, 8MB PSRAM, 16MB SPI flash, a small 2-inch display showing the EPS-NOW channel, a 1W speaker (1W-0928), and an SPM4123 microphone. While button A is held, the audio is broadcast over ESP-NOW to one or more Core2 devices, and in all other cases, the devices are set as audio receivers.
Via Adafruit

Jean-Luc started CNX Software in 2010 as a part-time endeavor, before quitting his job as a software engineering manager, and starting to write daily news, and reviews full time later in 2011.
Support CNX Software! Donate via cryptocurrencies, become a Patron on Patreon, or purchase goods on Amazon or Aliexpress. We also use affiliate links in articles to earn commissions if you make a purchase after clicking on those links.





