Those Charts Show The Benefits of Microphone Arrays for Hot Word Detection

Since I started looking more into smart speakers, including DIY ones such as the I made with Orange Pi Zero board + Google Assistant with a single microphone, I was told about the importance of microphone arrays, but so far, I had not seen any clear study or data about that. That changed today, as I came across a review of mic arrays by the makers of Snips Voice Platform. They tested five arrays connected to a Raspberry Pi 3 with the system, and also added a generic USB microphone to the mix. The results speak for themselves…

Click to Enlarge

In that experiment, they measured the rate at which a hot word was successfully detected by incrementally increasing the distance between 0.5 meters to 5 meters (16 ft), and for each distance, repeating the hot word 25 times at 3 second intervals using pre-recording to keep the voice level constant, and the same gain for all microphones. They did so in a silent room, a room with white noise, and finally one with background music. The generic USB microphone works just as well as mic arrays in a silent room up to 2 meters, but further away, the success rate drops dramatically.

The case for the arrays is even better when white noise is added, as the USB microphone’s success rate is only comparable around 50 cm away. When you add background music to the mix, it’s a bit messier, with the USB microphone performing even worse than with noise, and while microphone arrays’ success rate drop, most managed fairly well. The best array based on that test is PlayStation 3 Eye, which comes with 4 microphones and costs just $8 US on Amazon, significantly cheaper than any competitors in the test including the USB microphone… ReSpeaker does not perform too bad either, and MiniDSP appears to be the weakest of the lot, especially for tests at one meter, despite using the same XMOS XVSM-2000 chip on in ReSpeaker array.

Support CNX Software - Donate via PayPal or become a Patron on Patreon
Advertisements
Subscribe
Notify of
guest
11 Comments
oldest
newest most voted
Schmurtz
Schmurtz
2 years ago

The best microphone array is microsoft kinect, It works even better than this ps3 array.
Ps3 stays a really great device at a great price.
I know that because we made many tests years ago with S.A.R.A.H community.
S.A.R.A.H is a kind of jarvis, google home or amazon echo but I use it before 2010 and it is really easy to creates plugins on it or add some recognition unknow words. May be it is mainly oriented for french people…
An example with kodi/xbmc:
https://youtu.be/B-l8kn_0tuM

Anyway, make a test with kinect microphone array you will not be disappointed!

Jon Smirl
2 years ago

I’d like to see the Conexant CX20924 compared too. This is a new chip that does hot word and array processing inside the chip. That is different than the x-powers chip which relies on the host to do the processing. The idea with the Conexant chip is that the host can be in sleep mode while it watches for the hot word.

Complete dev kit:
http://conexant.com/amazon-avs/ds20924/

Jon Smirl
2 years ago
Jon Smirl
2 years ago

Kinect is a custom ASIC. There is no way to buy the chip.

Jon Smirl
2 years ago

I missed the Conexant entry. The chip is $6.25 so it is doing pretty good for something that cheap. That includes the hot word engine so you avoid a license fee for something like snowboy.

I am still waiting for someone to release this on ESP32 with the two ADC and no license fees.

Drone
Drone
2 years ago

Take those test results with a LARGE grain of salt… Mic synthetic beam-formed array performance is all about good DSP code carefully tuned to the position and directivity patterns of the mics. This makes these arrays notoriously difficult to test and compare. Often testing in an anechoic chamber will yield widely different results when the microphones are repositioned by even a small amount. This is because the synthetic directivity pattern of all the microphones working together post processing can be very complex with narrow lobes. The same effect is observed as the test frequency changes. In a working environment, echoes… Read more »

Hitesh Patel
Hitesh Patel
2 years ago

PS3 eye mic require special drivers to support on linux/windows. It is installed by default without array mic features and speacial driver are available paid version.

Drone
Drone
2 years ago

@Jean-Luc Aufranc (CNXSoft) “Any idea about the wide price disparity? How can PS3 Eye be only $8, while others are close to $100? Or is it just because the other are dev kits, and hence cost more?” 1. The xCORE-VOICE devices are at the development level, so yes you are paying for the “free” software tool-chain and documentation they come with. 2. At $8, the PS Eye hardware looks to me like surplus old stock bought in bulk for pennies on the dollar. The PS Eye was first introduced in 2007, a decade ago. I don’t believe the PS Eye… Read more »

Advertisements