Archive

Posts Tagged ‘audio’
Orange Pi Development Boards

Espressif ESP32 LyraTD MS1 HDK is Designed for Smart Speakers, Wireless Audio and other Smart Home Appliances

January 16th, 2018 7 comments

So apparently voice command will represent 50% of all searches in the next two years, and everybody is jumping on the smart speaker bandwagon, with announcements from many companies at CES 2018, including Google’s Android Things + Assistant products‘ announcement,  NXP i.MX 8M official launch, Amazon Alexa Voice Service (AVS) development kit from Amlogic and Allwinner, and more.

Espressif Systems is about to join the party with their ESP32 LyraTD MS1 HDK (Hardware development kit) that most people will likely remember as “Audio Mic HDK” that was announced on Twitter.

Click to Enlarge

Espressif Audio Mic HDK specifications:

  • Wireless Module – ESP32-WROVER module with 802.11 b/g/n WiFi and Bluetooth 4.1 LE connectivity.
  • DSP – 4-mic array chip
  • Storage – micro SD card for audio files
  • Audio
    • Audio driver chip
    • Earphone jack
    • Dual speaker output ports
    • 4x microphone array with up to 3 meter sensitivity while playing music
  • Expansion
    • I2C/SPI header
    • 6-pin UART header
    • I2S header
    • Others undocumented
  • Debugging – USB-UART micro USB interface (based on CP2102N), and JTAG header
  • Misc – Power switch, 8x keys on top
  • Power Supply – 5V via micro USB port

The kit can work over WiFi or Bluetooth, supports major cloud voice vendors such as Amazon Alexa, Google Assistant, and Baidu DuerOS. Soft decoder, and hot word recognition runs directly on ESP32 processor.

In twitter, the company also said you could implement your own hotword/keyword, by providing around 5000 unique recordings of your selected word, and that they expect to ship the board next week. It’s unclear when the board will be available for sale however.

One of the commenter mentioned he made his own ESP32 Circle evaluation kit with an audio jack, and a single microphone. If you are interested in that third party board, you can purchase it on Taobao for 169 RMB (~$26). The official Espressif Audio Mic HDK should sell for a bit higher due to the extra features.

Google Assistant SDK Now Supports Device Actions, More Languages (French, German, Japanese)

December 29th, 2017 1 comment

Back in May 2017, Google released the Assistant SDK that worked on Raspberry Pi 3, and other ARM boards, essentially transforming low cost development boards into Google Home equivalent. The SDK became more popular once Google’s AIY Voice Kit was launched since it offered an easy and inexpensive way to use it with Raspberry Pi 3 board.

Since all you need was a Linux board with an Internet connection, a microphone, and speaker, I tried Google Assistant SDK on one of the cheapest platform available: Orange Pi Zero Set 6 Kit including Orange Pi Zero board, but also an expansion board with built-in microphone and audio output jack, and a cute little case. I added my own pair of speakers, micro SD card, and USB power supply, and after setting up the software, I was able ask question, and get answers with female voice using the demo app.

At the time however, there was some limitations, as integration with home automation devices was not easy, English US was the only language option, and we were stuck with a female voice. Since then, Google has added support for male voice for text-to-speech, and as I checked the release notes, Google added support for more languages, and device actions in December 20.

Changelog:

  • Google Assistant Library (developer preview 0.1.0)
    • Support for Device actions.
    • Support for more languages: English (Australia, Canada, UK, US), French (Canada, France), German, and Japanese. Selectable from Google Assistant app.
    • Location can now be configured as a street address in the Google Assistant app.
    • Better handling of connection errors.
  • Google Assistant Service (v1alpha2)
    • Support for Device actions.
    • Support for more languages: English (Australia, Canada, UK, US), French (Canada, France), German, and Japanese. This setting can be passed through the Service API or selected from the Google Assistant app.
    • Location can now be configured as a street address in the Google Assistant app, or as a latitude and longitude via the API.
    • Support for displaying the text of the user’s request and the text response from the Google Assistant.
    • Support for submitting queries via text input (Using Device actions or IFTTT).

I could update the library on Orange Pi Zero as follows:

I could still use my DIY smart speaker to ask questions and get answers after a reboot, so the update went smoothly. Controlling other devices like Sonoff TH16 will require some more studying.

VoltaStream AMP1 Linux Audio Board Includes a Stereo Audio Amplifier, Adds WiFi and Bluetooth

November 23rd, 2017 1 comment

Last summer I wrote about VoltaStream ZERO an audio board powered by NXP i.MX6ULL processor, with up to 1GB RAM, a Texas Instruments DAC, and leveraging Raspberry Pi Zero form factor. The board runs a custom Linux distribution called PolyOS built with the Yocto Project, and including shairport-sync, librespot, SqueezeLite, a DLNA renderer, and more.

Polyvection, the company behind the project is now back with VoltaStream AMP1 audio development board, with half the board very similar to VoltStream ZERO, and the other half featuring an audio amplifier, and a wireless module for WiFi and Bluetooth.

Click to Enlarge

VoltaStream AMP1 board specifications:

  • SoC – NXP i.MX6ULL ARM Cortex-A7 processor @ 996 MHz
  • System Memory – 512 MB DDR3
  • Storage – micro SD card slot
  • Audio
    • 1x I2S for integrated DAC and AMP, 1x I2S for GPIO access, 1x TOSLINK-IN jack
    • Analog DAC – Texas Instruments PCM1862 (SNR 103 dB)
    • Amplifier – ISSI IS31AP2121 / class-D / SNR 104 dB; 2x 35 Watt (2x 25 watt continuous)
  • Connectivity – Dual band 802.11 b/g/n/ac WiFi and Bluetooth 4.1 (Qualcomm QCA9377)
  • USB – 1x micro USB slave port (USB gadget mode supported), 1x USB type A host port
  • Expansion Headers – 40-pin GPIO header with 5V, 3V3, GND, 2x UART, flexCAN, 2x I2C, SPI, I2S, 3x PWM, S/PDIF input
  • Misc – Integrated button handler / accessible from header
  • Power Supply – 5V via micro USB port (TBC), or 10 to 20V via barrel jack
  • Power consumption – TBD (For reference: Voltastream ZERO: 0.25 Watt – Linux idle)
  • Dimensions – 65 mm x 56 mm

Click to Enlarge

VoltaStream AMP1 board will ship with a 100 mm external antenna with adhesive, two wooden plates for a simple case together with required screws and spacers. You’ll need to add your own power supply (30W or greater), micro SD card with 2GB or greater capacity, and of source the speakers.

The board supports the same audio specific PolyOS operating system, or a more generic PolyBian Debian based Linux distribution. Documentation including a getting started guide and schematics (PDF), as well as download links can be found in the product page, where you’ll also be able to purchase the board for 83.19 Euros plus eventual VAT, and shipping costs.

Thanks to Frederic for the tip.

Intel Speech Enabling Developer Kit Works with Alexa Voice Service, Raspberry Pi 3 Board

October 28th, 2017 4 comments

We’ve known Intel has been working on Quark S1000 “Sue Creek” processor for voice recognition for several months. S1000 SoC is based on two Tensilica LX6 with HiFi3 DSP, some speech recognition accelerators, and up to 8x microphones interfaces which allows it to perform speech recognition locally. The solution can also be hooked to an application processor via SPI, I2S and USB (optional) when cloud based voice recognition is needed.

Intel has recently introduced their Speech Enabling Developer Kit working with Amazon Alexa Voice Service (AVS) featuring a “dual DSP with inference engine” – which must be Quark S1000 – and an 8-mic array. The kit also includes a 40-pin cable to connect to the Raspberry Pi 3 board.

Click to Enlarge

Intel only provided basic specifications for the kit:

  • Intel’s dual DSP with inference engine
  • Intel 8-mic circular array
  • High-performance algorithms for acoustic echo cancellation, noise reduction, beamforming and custom wake word engine tuned to “Alexa”
  • 6x Washers
  • 3x 6mm screws
  • 3x 40mm female-female standoffs (x3)
  • Raspberry Pi connector cable

I could not find detailed information to get started, except for assembly guide shown in the video below. We do not that the kit will work with Amazon Alexa, and requires a few extra bits, namely a Raspberry Pi 3 board, an Ethernet cable, a HDMI cable and monitor, USB keyboard and mouse, an external speaker, a micro USB power supply (at least 5V/1A), and a micro SD card.

The video also points to Intel’s Smart Home page for more details about software, but again I could not find instructions or guide there,  except links to register to a developer workshop at Amazon Re:Invent in Las Vegas on November 30, 2017.

Intel Speech Enabling Developer Kit can be pre-ordered for $399 directly on Intel website with shipping planned for the end of November. The product is also listed on Amazon Developer page, but again with little specific information about the hardware and how to use it. One can assume the workflow should be similar to other AVS devkits.

Thanks to Mustafa for the tip.

AMBE+2 Vocoder Promises High Voice Quality at Low (2.0 to 9.6 Kbps) Data Rates

October 24th, 2017 4 comments

Opus 1.2 open source audio codec was release a few months ago with the ability to deliver low power low high-quality audio bitrate for speech with bitrates as low as  12 Kbps. Digital Voice Systems (DVSI) claims to have gone even lower thanks to their AMBE+2 vocoder (Advanced MultiBand Excitation) providing high-quality speech at data rates from 2.0 to 9.6 kilobytes per second.

AMBE+2 vocoder is said to outperform the company’s previous generation AMBE+ Vocoder as well as the G.729 and G.726 vocoders, while operating at only 4.0 Kbps. The vocoder is suitable for mobile radio, secure voice, satellite communication, computer telephony, digital voice and storage applications

AMBE+2 Vocoder Chips

The solution can be integrated into product either using software licensing, or through Vocoder chips, and the company lists the following key benefits:

  • Maintains speech intelligibility and speaker recognition at rates as low as 2.0 kbps
  • Resistant to background noise and channel bit errors
  • Customizable data from 2.0 to 9.6 kbps
  • Uses fewer computations than CELP (Code-excited linear prediction) as used in G.729
  • Does not require the use of a residual signal
  • Eliminates fixed data-rate and codebook problems
  • Low complexity reduces implementation costs

You can listen to male and female samples at different bitrate for your own evaluation. DVIC claims the technology is already used in digital mobile radio and satellite telephony solutions such as Inmarsat, Iridium, DMR Communication PBX, etc…

AMBE-4020 HDK

AMBE+2 voice compression algorithm is available for DSPs and CPUs from Texas Instruments, Analog Devices, ARM, MIPS, Intel, NXP, and others, and runtime environments are available for Windows, Linux, Android, iOS, VxWorks, uC/OS, and other operating systems on request. The company can also provide hardware development kits (HDK) based either on AMBE-3000 or AMBE-4020 AMBE+2 chip, USB based products from a single channel dongle to a the 12 full-duplex channel USB-3012 product, as well as Net-2000 VCUs (Voice Connect Unit) that bridge analog speech I/O to an Ethernet network for example for VoIP or voice-monitoring / recording products (for the CIA? :)).

More details can be found on DVSI’s AMBE+2 product page.

Amlogic A111, A112 & A113 Processors are Designed for Audio Applications, Smart Speakers

September 9th, 2017 6 comments

Amlogic processors are mostly found in TVs and TV boxes, but the company is now apparently entering a new market with A111, A112, and A113 audio processors. I was first made aware of those new processors through Buildroot OpenLinux Release Notes V20170831.pdf document posted on their Open Linux website, where two boards with Amlogic A113D and A113X are shown.

S400 Version 03 Board

First, S400 board with the following key features/specifications:

  • SoC – Amlogic A113D CPU
  • System Memory – 1GB DDR3
  • Storage – 512MB SLC NAND flash
  • Display I/F – MIPI interface
  • Connectivity – Gigabit Ethernet SDIO WiFi/BT (AP6356S)
  • Audio
    • SPDIF_IN/SPDIF_OUT
    • LINE_IN/LINE_OUT
    • 2x Audio headers (MIC_Connector & SPK_Connector)
  • USB – 1x USB 2.0 OTG
  • Expansion – 2x PCIe ports
  • Misc – 6x ADC Keys, IR_IN/IR_OUT, UART Interface (RS232)

The second S420 board is based on A113X SoC, and comes with less features (no display, no Ethernet, no PCIe…), less memory:

  • SoC – Amlogic A113X CPU
  • System Memory – 512 MB DDR3
  • Storage – 512MB SLC NAND flash
  • Connectivity – SDIO WiFi/BT (AP6356S)
  • Audio
    • SPDIF_IN
    • LINE_IN/LINE_OUT
    • 2x Audio headers (MIC_Connector & SPK_Connector)
  • USB – 1x USB 2.0 OTG
  • Misc – 6x ADC Keys, IR_IN/IR_OUT, UART Interface (RS232)

The document also explains how to build Linux built with buildroot (you’ll need an Amlogic account), and use audio via applications or frameworks such as aplay, gstreamer, alsaplayer, shairport (Airplay), VLC, DLNA, etc…

Information about Amlogic A113X/A113D processor is lacking on the web, but I eventually found that Amlogic had a YouTube account with now a whopping two subscribers (including yours truly), and one of the two videos was an Alexa voice services demo on Amlogic A113 with what looks like a microphone array inserted on the top of the board.

Further research led me to a page in Chinese discussing Amlogic A111, A112, A113 audio processors, and revealing that Xiaomi AI smart speaker is based on Amlogic A112 quad core Cortex A53 processor, that also shows up in GeekBench running Android 6.0. They also report that A113 features the same four Cortex 53 cores, but has better audio capabilities with 8x PDM interfaces, and 16x I2S interfaces. I also found a page about a microphone array designed for Amlogic S905/S912/A112, and based on Knowles SPH0645LM4H-B miniature microphones .

Finally, I decided to go directly to Amlogic website, and they do have pages for A111 and A112 SoCs, strangely not indexed by search engines so far.

Amlogic A111 key features:

  • CPU – Quad-core ARM Cortex-A5
  • Audio Interface
    • 2-channel I2S input and output
    • TDM/PCM input and output, up to 8 channels
    • S/PDIF output
  • Video Interface – LVDS and MIPI-DSI panel output
  • Security – Supports secure boot and secure OS
  • Ethernet – 10/100/1000M MAC
  • IP License (Optional) – Dolby Digital, Dolby Digital Plus, DTS Digital Surround, DTS HD, DTS Express
  • Process – 28nm HKMG

Amlogic A112 key features:

  • CPU – Quad-core ARM Cortex-A53
  • Audio Interface
    • 8-channel I2S and S/PDIF input and output
    • TDM/PCM input and output, up to 8 channels
    • 2-channel PDM input
  • Video Interface – RGB888 output
  • Security – Supports secure boot and secure OS
  • Ethernet – 10/100M MAC+PHY
  • IP License(Optional) – Dolby Digital, Dolby Digital Plus, DTS Digital Surround, DTS HD, DTS Express
  • Process – 28nm HKMG

If you are interested in evaluating / playing with those processors, and cannot get hold of Amlogic boards (since they only deal with companies), one solution is to get Xiaomi AI smart speaker available for pre-order/arrival notice on sites likes GearBest or GeekBuying, and expected to ship on October 1st.

Thanks to vertycall for the tip.

Google Assistant News – AIY Voice Kit For Sale, Offline Support, 3rd Party Smart Speakers Announced

September 1st, 2017 5 comments

There’s been a lot of development related to Google Assistant in the last few days. First, Google provided an update for AIY Projects, with their AIY Projects Voice Kit now available for pre-order on Micro Center for $35 including a Raspberry Pi 3 board, making the kit virtually free, although you may also purchase it. Note that Micro Center blocks traffic originating from some countries, so I had to use Zend2 to access the site. [Update 10/09/2017: You can also get it from Seeed Studio for worldwide shipping]

Click to Enlarge

Google also announced the Speech Commands Dataset with 65,000 one-second long utterances of 30 short words, which they are in the process of integrating with the next release of the Voice Kit, and will allow the devices to respond to voice commands without the need for an Internet connection. So if you lose your Internet connection, or want to isolate your Voice Kit from it, you can still perform simple tasks like turning on/off lights without an Internet connection.

In this first blog post, the company also showcased some projects based on the Voice Kit, and encouraged the community to provide input for the next version with Hackster.io, or showcase your work on social networks using #AIYprojects hash tag.

The next day, Google published another blog post explaining Google Home, eligible Android phones, iPhones, Google Allo and Android Wear, will soon be joined  by third party speakers supporting Google Assistant and supporting the same features like answering requests, playing music, and controlling appliances. One day later a bunch of announcements was made at IFA 2017, and the company updated their blog post with some list of 3rd party Google Assistant Speakers all scheduled to launch by the end of the year, or early 2018:

  • JBL LINK 10, LINK 20 and LINK 300 respectively 8, 10 and 50 Watts WiFi smart speakers coming to UK, Germany and France starting fall 2017 for 169 Euros, 199 Euros and 299 Euros.

  • Onkyo Smart Speaker G3 (VC-GX30) “acoustic suspension” speaker with 80mm pressed-pulp diaphragm woofer, and 20mm soft-dome tweeter. Available in Black of White.

  • Sony LF-S50G with clock showing through speaker, and to be sold for $199.99.

Availability will be limited to some countries only, likely partially due to a lack of language support, with most expected to be available in the US, UK, Australia, Canada, Germany and France.

Those Charts Show The Benefits of Microphone Arrays for Hot Word Detection

August 31st, 2017 11 comments

Since I started looking more into smart speakers, including DIY ones such as the I made with Orange Pi Zero board + Google Assistant with a single microphone, I was told about the importance of microphone arrays, but so far, I had not seen any clear study or data about that. That changed today, as I came across a review of mic arrays by the makers of Snip Voice Platform. They tested five arrays connected to a Raspberry Pi 3 with the system, and also added a generic USB microphone to the mix. The results speak for themselves…

Click to Enlarge

In that experiment, they measured the rate at which a hot word was successfully detected by incrementally increasing the distance between 0.5 meters to 5 meters (16 ft), and for each distance, repeating the hot word 25 times at 3 second intervals using pre-recording to keep the voice level constant, and the same gain for all microphones. They did so in a silent room, a room with white noise, and finally one with background music. The generic USB microphone works just as well as mic arrays in a silent room up to 2 meters, but further away, the success rate drops dramatically.

The case for the arrays is even better when white noise is added, as the USB microphone’s success rate is only comparable around 50 cm away. When you add background music to the mix, it’s a bit messier, with the USB microphone performing even worse than with noise, and while microphone arrays’ success rate drop, most managed fairly well. The best array based on that test is PlayStation 3 Eye, which comes with 4 microphones and costs just $8 US on Amazon, significantly cheaper than any competitors in the test including the USB microphone… ReSpeaker does not perform too bad either, and MiniDSP appears to be the weakest of the lot, especially for tests at one meter, despite using the same XMOS XVSM-2000 chip on in ReSpeaker array.

Categories: Audio, Hardware, Testing Tags: audio