Getting Started with ReSpeaker WiFi IoT Board’s Audio Capabilities, Voice Recognition and Synthesis

ReSpeaker is a development board combining an Atmel AVR MCU, a MediaTek MT7688 WiFi module running OpenWrt, a built-in microphone, an audio jack, and I/O headers to allow for voice control and output for IoT applications. That means you could make your own Amazon Echo like device with the board and add-ons, use it as a voice controlled home automation gateway and more. The board was launched on Kickstarter a few days ago, and already raised $100,000 from about 100 backers, but I’ve received an early sample, so I’ll provide some more information about the firmware, and shows how to use with some Python scripts leveraging Microsoft Bing Speech API.

Click to Enlarge
Click to Enlarge

You’ll need a micro USB to USB cable to connect your to computer (Linux, Windows, Mac OS…), and a speaker to connect to the board. Linux (OpenWrt) boots in a few seconds, and once it’s done all RGB LED will continuously blink.

I’m using a computer running Ubuntu 16.04, and ReSpeaker is detected by the system as an Arduino Leonardo board:


That’s optional, but if you want you can access the serial console, with programs like Minicom, screen, putty or hyperterminal and set the connection to 57600 8N1 to access the command. Here’s the full boot log:

If you think something is odd here… That’s because the serial connection will miss some characters. This happens with two computers and different USB cables. Hopefully this is either a specific issue with my sample, or if it is an issue it will be fixed by the time boards ship to Kickstarter backers [Update: The company explained me that it’s because the Atmel 32u4 and Mediatek MT7688 share the same USB port]. So instead of using the serial console, I’ll use SSH instead which means I have to connect to ReSpeaker WiFi access point first, and configure it.

LinkIt_Smart_Access_PointReSpeaker will show as LinkIt_Smart_7688_XXXXX, because the WiFi module is exactly the same as LinkIt Smart 7688 IoT board, and unsurprisingly the configuration interface is exactly the same.ReSpeaker_WiFi_PasswordFirst set the root password, and login with that password.

ReSpeaker_Station_Mode_OpenWrt_LUCIThen go to Network tab, select station mode, and connect to your access point by entering your password. Click Configure, and you’re done. As you can see on the right above, you can also use OpenWrt’s LUCI interface to configure networking.

Now find ReSpeaker IP address via your Router DHCP client list, arp-scan, or other method:


You can now connect to the board via SSH:


and use the password you set in the web interface.

Now let’s check some CPU information:


We’ve got Mediatek NT7688 MIPS24K processor as advertised, so let’s check a few more details:


The board runs Linux 3.18.23, has 7.6MB available storage, and 128MB RAM in total.

I’m not going to test the audio features with command tools, and python script, and also include a video demo at the end of this review.Since I don’t have ReSpeaker Microphone array add-on, I have to be fairly close to the microphone for it to work well, maybe one meter at most, or the volume would be really low.

I’ll start by checking audio recording and playback with any API or internet access requirements.
We can record audio with 16000 sample rate, 16 bit width, 1 channel using the following command


and play it back with aplay:


It worked OK for me, although the volume seemed quite low.

Now we can do something a little more interested as Seeed Studio develop a few Text-to-speech and Speech-to-text Python scripts. You can retrieve the scripts from ReSpeaker github account, and install one dependencies to setup the board:


The script are using Microsoft Speech API, but in theory you could use any other speech API. Since Seeed Studio has already done all the hard work, I simply applied for a Microsoft peech API key in order to be able to use the demo.

Microsoft_API_KeyThat’s free for testing / evaluation, but if you intend to use it in commercial products, or for your own case, if you use more 5,000 transactions per month, you’d need to purchase a subscription.

You’ll find three Python scripts in the directory namely: bing_voice.py, bing_stt_with_vad.py,  tts.py. Look for BING_KEY inside each script, and paste your own key.

Time to have some fun, starting with the speech to text script:


It’s pretty slow to start (about 15 seconds), and then there are a few error message, before you can see the “* recording” message, and you can talk, with Bing returning the results: “Bing:你好”. Chinese? Yep, as currently the default is Chinese, but if it is not your strongest language, you can edit bing_stt_with_vad.py, and change the language replacing zh-CN by en-US, or other language strings:


An English works too (sort of):


In the first sentence, I said “Hello World! Welcome to CNX Software today”, but it came out as “hello world next software”, maybe because of my accent, but I doubt it…

Then I wanted to try Thai language, but I got an API failure simply because the number of supported languages by Microsoft Speeach API is limited as shown in the table below.

language-Country language-Country language-Country language-Country
ar-EG* en-IN fr-FR pt-BR
ca-ES en-NZ it-IT pt-PT
da-DK en-US ja-JP ru-RU
de-DE es-ES ko-KR sv-SE
en-AU es-MX nb-NO zh-CN
en-CA fi-FI nl-NL zh-HK
en-GB fr-CA pl-PL zh-TW

If your language is not listed here, then you could Google Speech API instead, and it’s likely Seeed Studio or the community will have written compatible scripts by the time ReSpeaker boards ship to backers.

So you now know how to convert your voice to text, and you can use that text to send a web search, or toggle GPIOs, but you may also want to get an audio answer to your action, and tts.py script is there for your, and very easy to use:


It did not really feel realistic, but at least I could understand the female voice in the speakers. Looks in the script I did not see any language settings, so I assume the API will automatically detect the language, and inputted a string in French instead, and all I heard was gibberish. Finally I found that you can change the voice language in bing_voice.py script with contains most of the code:


I replaced the US female voice, but a French male voice, added a “famous French saying”:


At least it was understandable, but Microsoft has still some work to do the audio output was more like “Salut mon gars. commencer a va?”. The reason could also be that the correct writing is “Comment ça va”, but the terminal (set to UTF-8), did not let me input “ç”.

You can watch all those demo in the video below to get a better feel about the audio quality, delays, and capabilities of Microsoft Bing Speech API.

8
Leave a Reply

avatar
8 Comment threads
0 Thread replies
5 Followers
 
Most reacted comment
Hottest comment thread
6 Comment authors
FergusLanonJMGary Recent comment authors
  Subscribe  
newest oldest most voted
Notify of
JM
Guest
JM

They didn’t send you the extra microphone array? Seems to be most interesting thing in that KS

Gary
Guest
Gary

By the time you’ve bought the Core board, the Mic Array addon board, the Meow King driver unit and spent many hours messing around with it, you may wanna cut your losses and buy Google Home or Amazon Echo. Should you save you some time.

JM
Guest

@Gary
You’ll also be giving up control and passing your data straight over to Amazon or Google who will mine it down to the last tick in ways you probably don’t even imagine.

This may take more time and but at least it’s yours to control and choose what to do, what services to use, etc.

If you’re not into customisation and lots of DIY you may be reading the wrong website 🙂

anon
Guest
anon

@JM

Care to read the text above? This respeaker thing sends all things you speak at your home to microsoft and google.

Gary
Guest
Gary

@anon
@JM Well, it was a hyperthetical comparison on price and effort, not how demonic Google are, nevertheless if you represent this site and based on one comment suggest it’s not for me, maybe you’re right.

As for your delusion on having control of anything….mwaaahahahaha!

JM
Guest
JM

@anon
I’m not sure you understand what you’re read, but here you’re obviously free to choose any service not just Microsoft or Google.

This includes Nuance, IBM Bluemix or even your own voice processing stack.

FergusL
Guest
FergusL

I guess it won’t be the primary feature users are looking for but I can comment on this hardware in terms of audio latency.

As I could guess from the article, this is the very same base image and kernel as in the LinkIt Smart 7688, which I own. Even the audio codec is the same component.
From my tests, I could not get to a latency below 17.4 milliseconds for audio playback. This is way above the maximum latency I’d have hoped for, in the context of making musical instruments.

It might be possible to improve the limiting I2S driver to reach decent timings.

@Gary @JM @anon
I suppose the debate is even at a higher level than that, Google and Amazon are probably selling device for *end users* whereas the ReSpeaker is designed for developers/hackers/makers. Two very different products!