SunFounder has sent me a review sample of the Fusion HAT+ Raspberry Pi expansion board designed for motor and servo control using audio interactions with its built-in microphone and speaker, as well as LLM models. It can be used as an AI-enabled robot controller, a smart home hub, a voice assistant, or an interactive learning platform.
In this review, after an unboxing and going through the installation of the Fusion HAT+ on a Raspberry Pi 5 2GB, I’ll mainly focus on the voice interaction part using text-to-speech (TTS), speech-to-text (STT), and local and cloud-based LLMs and VLMs, and also quickly test servo control to wave a flag using voice commands.
SunFounder Fusion HAT+ unboxing
I received the sample in the retail package reading “SunFounder Fusion HAT+ for Raspberry Pi” and detailing the key features, namely rechargeable battery, 12x PWM, onboard speaker and microphone, 4x 12-bit ADC, safe shutdown, 4x DC motor drivers, I2C/SPI/UART expansion, and compatibility with Raspberry Pi 5/4/3B+/Zero 2W.

The package contains the Fusion HAT+ board, three cables for motors, a package content and assembly guide, a 7.4V/2,000mAh Li-Ion battery, and a few screws and nylon standoffs for assembly.

The Fusion HAT+ itself is powered by a GD32 microcontroller, features battery and USB-C “Charge” connectors, configuration jumpers, a 40-pin GPIO header, PWM and servo connectors, four motor control ports, an I2C connector, a speaker, and a few buttons and LEDs.

The microphone can be found on the bottom side of the board.
Raspberry Pi 5 and Fusion HAT+ assembly
I’ll use a Raspberry Pi 5 2GB SBC for testing, and assembly is fairly straightforward.
First, install the standoffs and secure them with screws. Then insert the Fusion HAT+ on the Raspberry Pi 5, add four more screws, connect the battery, and you’re done. As usual, SunFounder will add a few spare parts in case they get lost during assembly, and I ended up with two short standoffs, two long standoffs, and two screws. I always like it when companies do that, since I’m not a big fan of searching for screws on the floor or under furniture.
A rookie mistake
After the hardware was ready, I connected the kit to a 5V/5A Raspberry Pi USB-C power adapter and followed the instructions to test various demos successfully. But after a little over 24 hours, it all stopped working, even samples that used to work:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
pi@raspberrypi:~/fusion-hat/examples $ sudo python tts_piper.py 2026-01-25 16:53:02.060985944 [W:onnxruntime:Default, device_discovery.cc:164 DiscoverDevicesForPlatform] GPU device discovery failed: device_discovery.cc:89 ReadFileContents Failed to open file: "/sys/class/drm/card1/device/vendor" Traceback (most recent call last): File "/home/pi/fusion-hat/examples/tts_piper.py", line 3, in <module> tts = Piper() File "/usr/local/lib/python3.13/dist-packages/fusion_hat/tts.py", line 126, in _piper_init_with_speaker enable_speaker() ~~~~~~~~~~~~~~^^ File "/usr/local/lib/python3.13/dist-packages/fusion_hat/device.py", line 145, in wrapper raise_if_fusion_hat_not_ready() ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^ File "/usr/local/lib/python3.13/dist-packages/fusion_hat/device.py", line 133, in raise_if_fusion_hat_not_ready raise IOError("Fusion HAT not connected, check if Fusion Hat is powered on.") OSError: Fusion HAT not connected, check if Fusion Hat is powered on. |
The kernel log indicated a hardware issue as the Pi 5 was unable to communicate with the Fusion HAT+ over I2C:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
[ 3.311487] fusion_hat: loading out-of-tree module taints kernel. [ 3.329380] fusion_hat 1-0017: I2C write word failed: -121 [ 3.329389] fusion_hat 1-0017: Failed to initialize channel 0: -121 [ 3.329391] fusion_hat 1-0017: Failed to initialize PWM subsystem: -121 [ 3.330123] fusion_hat 1-0017: I2C write word failed: -121 [ 3.330583] fusion_hat 1-0017: I2C write word failed: -121 [ 3.330719] fusion_hat 1-0017: I2C write word failed: -121 [ 3.330852] fusion_hat 1-0017: I2C write word failed: -121 [ 3.330984] fusion_hat 1-0017: I2C write word failed: -121 [ 3.331122] fusion_hat 1-0017: I2C write word failed: -121 [ 3.331258] fusion_hat 1-0017: I2C write word failed: -121 [ 3.331400] fusion_hat 1-0017: I2C write word failed: -121 [ 3.331537] fusion_hat 1-0017: I2C write word failed: -121 [ 3.331909] fusion_hat 1-0017: I2C write word failed: -121 [ 3.332048] fusion_hat 1-0017: I2C write word failed: -121 [ 3.332308] fusion_hat 1-0017: I2C write word failed: -121 [ 3.343487] fusion_hat 1-0017: probe with driver fusion_hat failed with error -121 |
So I turned off the board and tried to readjust the HAT+, but it did not help. In the morning, I noticed the Raspberry Pi 5 had been turned off, but since I could still use it normally after starting it again, I didn’t think much of it. The Online LLM demo with Gemini would work (it doesn’t use the HAT). I could not find a clear solution, so I assumed a hardware failure, but after reading a bit more, I realized that the two orange battery LEDs were supposed to be on when the battery is fully charged, and they were turned off as if they were depleted, although USB power was always connected. I also disconnected the battery to try it again, and the same error occurred.
A facepalm moment happened shortly after, once I realized it was probably not a good idea to have connected the USB-C power supply to the USB-C port on the Raspberry Pi 5…
… instead of the USB-C “Charge” port on the Fusion HAT+. It didn’t seem like a mistake at first since the battery was fully charged and everything worked smoothly until I realized it was depleted the following day.
That also means the battery is not optional, and must be connected at all times for the Fusion HAT+ to work.
Software installation
I mostly followed the instructions on the documentation website for that part and most of the rest of the review.
The first step is to launch raspi-config, and go to the A11 – Shutdown behavior section:
We need to select “B1 Full power off” option.
After a reboot, we can now install the fusion-hat package to get the necessary libraries and tools for operating the Fusion HAT hardware:
|
1 |
curl -sSL https://raw.githubusercontent.com/sunfounder/sunfounder-installer-scripts/main/install-fusion-hat.sh | sudo bash |
After one more reboot, we can execute the audio setup script:
|
1 |
sudo /opt/setup_fusion_hat_audio.sh |
Here’s the output of the command for reference:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
get_sink_index ... sink index not found. Sometimes you need to reboot to activate the soundcard. get_source_index ... source index not found. Sometimes you need to reboot to activate the soundcard. set default Pulseaudio volume to 100% ... Do you wish to test speaker now? [y/N] y open speaker ... /opt/setup_fusion_hat_audio.sh: line 411: /sys/class/fusion_hat/fusion_hat/speaker: No such file or directory testing speaker ... speaker-test 1.2.14 Playback device is default Stream parameters are 48000Hz, S16_LE, 1 channels WAV file(s) Rate set to 48000Hz (requested 48000Hz) Buffer size range from 2048 to 8192 Period size range from 1024 to 1024 Periods = 4 was set period_size = 1024 was set buffer_size = 4096 0 - Front Left Time per period = 1.403260 0 - Front Left Time per period = 1.495619 0 - Front Left Time per period = 1.474478 All done! |
I didn’t get any audio at that point, but after a reboot, I could confirm audio worked.
Fusion HAT+ Text-to-Speech (TTS) options
SunFounder offers support for several TTS engines. We’ll start with Espeak:
|
1 2 |
pi@raspberrypi:~ $ cd fusion-hat/examples/ pi@raspberrypi:~/fusion-hat/examples $ sudo python3 tts_espeak.py |
I could hear “Hello! I’m Espeak TTS”.
Users can control the volume, speaking speed, word gap, and pitch:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
from fusion_hat.tts import Espeak # Create Espeak TTS instance tts = Espeak() # Set amplitude 0-200, default 100 tts.set_amp(200) # Set speed 80-260, default 150 tts.set_speed(150) # Set gap 0-200, default 1 tts.set_gap(1) # Set pitch 0-99, default 80 tts.set_pitch(80) tts.say("CNXSoft is in da house!") |
It works and it’s understandable, but the voice is fairly robotic.
Another option is pico2wave:
|
1 |
pi@raspberrypi:~/fusion-hat/examples $ sudo python tts_pico2wave.py |
I could hear “Hello! I’m Pico2wave TTS” in a more natural way than Espeak.
Let’s edit the code:
|
1 2 3 4 5 6 7 8 9 10 |
from fusion_hat.tts import Pico2Wave # Create Pico2Wave TTS instance tts = Pico2Wave() # Set the language tts.set_lang('en-US') # en-US, en-GB, de-DE, es-ES, fr-FR, it-IT # Quick hello (sanity check) tts.say("CNXSoft is in da house!") |
Here’s a demo with both sentences.
CNXSoft was not recognizable here, but it was fine when using a more standard sentence.
|
1 |
tts.say("Hello from Chiang Mai, Thailand!") |
The third option is Piper TTS:
|
1 |
pi@raspberrypi:~/fusion-hat/examples $ sudo python3 tts_piper.py |
I could hear “Hello, I’m Piper TTS” just fine. The output shows it supports a wider range of languages:
|
1 2 3 |
2026-01-24 21:00:25.335876115 [W:onnxruntime:Default, device_discovery.cc:164 DiscoverDevicesForPlatform] GPU device discovery failed: device_discovery.cc:89 ReadFileContents Failed to open file: "/sys/class/drm/card1/device/vendor" ['ar_JO', 'ca_ES', 'cs_CZ', 'cy_GB', 'da_DK', 'de_DE', 'el_GR', 'en_GB', 'en_US', 'es_ES', 'es_MX', 'fa_IR', 'fi_FI', 'fr_FR', 'hu_HU', 'is_IS', 'it_IT', 'ka_GE', 'kk_KZ', 'lb_LU', 'lv_LV', 'ml_IN', 'ne_NP', 'nl_BE', 'nl_NL', 'no_NO', 'pl_PL', 'pt_BR', 'pt_PT', 'ro_RO', 'ru_RU', 'sk_SK', 'sl_SI', 'sr_RS', 'sv_SE', 'sw_CD', 'tr_TR', 'uk_UA', 'vi_VN', 'zh_CN'] [] |
I modified the code to change the string:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
from fusion_hat.tts import Piper tts = Piper() # List supported languages print(tts.available_countrys()) # List models for English (en_us) print(tts.available_models('en_us')) # Set a voice model (auto-download if not already present) tts.set_model("en_US-amy-low") # Say something tts.say("CNXSoft is in da house!") |
I find it’s the clearest TTS solution so far.
The final option is OpenAI TTS, but it didn’t work, probably because I didn’t purchase credits:
|
1 2 3 4 5 6 7 8 |
pi@raspberrypi:~/fusion-hat/examples $ sudo python tts_openai.py 2026-01-24 20:57:51.354565538 [W:onnxruntime:Default, device_discovery.cc:164 DiscoverDevicesForPlatform] GPU device discovery failed: device_discovery.cc:89 ReadFileContents Failed to open file: "/sys/class/drm/card1/device/vendor" Say: Hello! I'm OpenAI TTS. [ERROR] OpenAI TTS API request error: 429 Client Error: Too Many Requests for url: https://api.openai.com/v1/audio/speech Say: with instructions, I can say word sadly, with instructions: 'say it sadly' [ERROR] OpenAI TTS API request error: 429 Client Error: Too Many Requests for url: https://api.openai.com/v1/audio/speech Say: or say something dramaticly., with instructions: 'say it dramaticly' [ERROR] OpenAI TTS API request error: 429 Client Error: Too Many Requests for url: https://api.openai.com/v1/audio/speech |
SunFounder provides a comparison table for all four TTS options, so users can select the most appropriate one for their requirements.
| Item | Espeak | Pico2Wave | Piper | OpenAI TTS |
|---|---|---|---|---|
| Runs on | Built-in on Raspberry Pi (offline) | Built-in on Raspberry Pi (offline) | Raspberry Pi / PC (offline, needs model) | Cloud (online, needs API key) |
| Voice quality | Robotic | More natural than Espeak | Natural (neural TTS) | Very natural / human-like |
| Controls | Speed, pitch, volume | Limited controls | Choose different voices/models | Choose model and voices |
| Languages | Many (quality varies) | Limited set | Many voices/languages available | Best in English (others vary by availability) |
| Latency / speed | Very fast | Fast | Real-time on Pi 4/5 with “low” models | Network-dependent (usually low latency) |
| Setup | Minimal | Minimal | Download .onnx + .onnx.json models | Create API key, install client |
| Best for | Quick tests, basic prompts | Slightly better offline voice | Local projects with better quality | Highest quality, rich voice options |
Testing Speech-to-Text (STS) with Vosk
Vosk is the only sample provided for Speech-to-Text support:
|
1 |
sudo python3 stt_vosk_stream.py |
Here’s the output of the command:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
vosk-model-small-en-us-0.15.zip: 100%|█████| 39.3M/39.3M [00:03<00:00, 11.6MB/s] Say something final: i am see and excerpt Say something final: today the weather is very hot Say something final: hello everyone Say something final: i am Say something final: see an excerpt Say something final: can i say something Say something final: what are you doing Say something final: why don't you listen to me Say something |
It failed to understand me at times, but it might be because I don’t exactly have a US accent.
This is what the Python code looks like:
|
1 2 3 4 5 6 7 8 9 10 11 |
from fusion_hat.stt import Vosk as STT stt = STT(language="en-us") while True: print("Say something") for result in stt.listen(stream=True): if result["done"]: print(f"final: {result['final']}") else: print(f"partial: {result['partial']}", end="\r", flush=True) |
Vosk supports over 20 languages and dialects: English, Indian English, German, French, Spanish, Portuguese, Chinese, Russian, Turkish, Vietnamese, Italian, Dutch, Catalan, Arabic, Greek, Farsi, Filipino, Ukrainian, Kazakh, Swedish, Japanese, Esperanto, Hindi, Czech, Polish, Uzbek, Korean, Breton, Gujarati, Tajik, Telugu, and Kyrgyz. More languages will be added to the project over time.
Local LLM and VLM testing with Ollama
The next step is to test local LLM (Large Language Model) and VLM (Vision Language Model) demos using Ollama. Since I only have a Raspberry Pi 5 2 GB, I’m not going to run Ollama on the Pi, and instead I installed it on the UP Xtreme i11 Edge mini PC with 16GB RAM:
|
1 |
curl -fsSL https://ollama.com/install.sh | sh |
I disable automatic launch of Ollama and ran it in server mode on the UP Xtreme i11 Edge computer:
|
1 2 3 4 |
sudo systemctl stop ollama sudo systemctl disable ollama ollama pull llama3.1:8b OLLAMA_HOST=0.0.0.0 ollama serve |
I also edit code on Raspberry Pi in ~/fusion-hat/examples/llm_ollama.py to add the server’s IP address and select the model I’ve just installed:
|
1 2 3 4 5 |
# Change this to your computer IP, if you run it on your pi, then change it to localhost llm = Ollama( ip="192.168.31.12", # e.g., "192.168.100.145" if remote model="llama3.1:8b" ) |
We can type a prompt with the keyboard to get answers.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
pi@raspberrypi:~/fusion-hat/examples $ sudo python3 llm_ollama.py Hello, I am a helpful assistant. How can I help you? >>> Why is the sky blue? The color of the sky appears blue because of a phenomenon called scattering. When sunlight enters Earth's atmosphere, it encounters tiny molecules of gases such as nitrogen and oxygen. These molecules scatter the light in all directions, but they scatter shorter (blue) wavelengths more than longer (red) wavelengths. This is known as Rayleigh scattering, named after the British physicist Lord Rayleigh who first described the phenomenon in the late 19th century. As a result of this scattering, our eyes see the blue light being scattered in all directions and that's why the sky appears blue during the daytime. It's worth noting that the color of the sky can appear different at different times of day or under different conditions, such as: * During sunrise and sunset, when the sun is lower in the sky, the light has to travel through more of the atmosphere, which scatters the shorter wavelengths (like blue) even more. This is why the sky often appears red or orange during these times. * On a cloudy day, the clouds can scatter the light in different ways, making the sky appear gray or white. But overall, the basic principle of Rayleigh scattering is responsible for the blue color of the sky! Do you have any other questions on this topic? |
Vision Language Models are also supported, so I’ve installed llava:7b on the server:
|
1 |
devkit@UPX-i11:~$ ollama pull llava:7b |
On the Raspberry Pi 5, we change the IP address from localhost to the IP address of our server and set the model in llm_ollama_with_image.py sample:
|
1 2 3 4 |
llm = Ollama( ip="192.168.31.12", # e.g., "192.168.100.145" if remote model="llava:7b" # change to "moondream:1.8b" or "granite3.2-vision:2b" for 8GB RAM ) |
I connected a Raspberry Pi Camera Module 3 to the Raspberry Pi 5 before running the command:
|
1 2 3 4 5 6 7 8 9 10 11 |
pi@raspberrypi:~/fusion-hat/examples $ python3 llm_ollama_with_image.py [0:00:31.352799647] [1918] INFO Camera camera_manager.cpp:340 libcamera v0.6.0+rpt20251202 [0:00:31.361772221] [1921] INFO RPI pisp.cpp:720 libpisp version 1.3.0 [0:00:31.364725184] [1921] INFO IPAProxy ipa_proxy.cpp:180 Using tuning file /usr/share/libcamera/ipa/rpi/pisp/imx708.json [0:00:31.373335239] [1921] INFO Camera camera_manager.cpp:223 Adding camera '/base/axi/pcie@1000120000/rp1/i2c@88000/imx708@1a' for pipeline handler rpi/pisp [0:00:31.373466406] [1921] INFO RPI pisp.cpp:1181 Registered camera /base/axi/pcie@1000120000/rp1/i2c@88000/imx708@1a to CFE device /dev/media2 and ISP device /dev/media0 using PiSP variant BCM2712_D0 [0:00:31.376763665] [1918] INFO Camera camera.cpp:1215 configuring streams: (0) 1280x720-BGR888/sRGB (1) 1536x864-BGGR_PISP_COMP1/RAW [0:00:31.376872369] [1921] INFO RPI pisp.cpp:1485 Sensor: /base/axi/pcie@1000120000/rp1/i2c@88000/imx708@1a - Selected sensor format: 1536x864-SBGGR10_1X10/RAW - Selected CFE format: 1536x864-PC1B/RAW Hello, I am a helpful assistant. How can I help you? >>> Describe the image The image shows an interior space with a desk and items on it. In the foreground, there appears to be a table with some boxes or materials on top of it. Behind the table, there is another table that seems to have a sign or label attached to it, though the details are not clearly visible. To the left of this second table, there is a partial view of what looks like packaging material, possibly cardboard boxes. The wall in the background is white and featureless, which provides a plain backdrop for the scene. There is no text present in the image to provide additional context or information about the location or purpose of the space. |
The description is accurate. Here’s the image saved in /tmp/llm-img.jpg.
Google Gemini online LLM test
SunFounder provides instructions for OpenAI, Gemini, Qwen, Grok, DeepSeek, and Doubao. I selected Google Gemini, mainly because it looks like it’s the only one with a free tier.
The first step is to log in to Google AI studio, and click Get API.
I create a new “Fusion HAT” project before clicking “Create key“.
Google AI Studio will generate a Gemini API key we can use for testing.
Copy the key and paste it into secret.py:
|
1 |
GEMINI_API_KEY="AI-your-key" |
The llm_gemini.pyasd is using the gemini-2.5-flash model by default, but you can switch to another one:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
pi@raspberrypi:~/fusion-hat/examples $ python3 llm_gemini.py Hello, I am a helpful assistant. How can I help you? >>> What do you know about CNX Software? **CNX Software** is a well-known and highly respected technology blog and website that focuses primarily on **embedded systems, open-source hardware, and software development**. It's a go-to resource for developers, engineers, makers, and tech enthusiasts interested in the latest advancements in areas like: 1. **Single Board Computers (SBCs):** They extensively cover popular SBCs like Raspberry Pi, ODROID, NVIDIA Jetson, and many other boards based on chips from Rockchip, Amlogic, Allwinner, NXP, etc. 2. **Development Boards:** News and reviews on various development boards for microcontrollers (e.g., ESP32, ESP8266, STM32) and more complex SoCs. 3. **IoT Devices:** Coverage of new Internet of Things hardware, sensors, and platforms. 4. **Embedded Linux & Android:** Deep dives into custom ROMs, kernels, drivers, and distributions tailored for embedded devices, including Android TV boxes and mini PCs. 5. **Processors & Architectures:** They report on new chip announcements, focusing heavily on ARM-based processors, but also increasingly on RISC-V and other architectures used in embedded contexts. 6. **Mini PCs & TV Boxes:** Often with a focus on their hackability, potential for running alternative operating systems, and open-source firmware. 7. **Open-Source Projects:** Highlighting new open-source hardware designs and software initiatives relevant to the embedded world. **What kind of content do they provide?** * **News and Announcements:** Covering new product launches, crowdfunding campaigns (especially on platforms like Kickstarter and Indiegogo), and industry updates. * **Detailed Reviews:** Often including hands-on testing, benchmarks, power consumption analysis, and in-depth analysis of hardware and software performance. They go beyond surface-level marketing. * **Tutorials and How-tos:** Guides for setting up development environments, flashing firmware, or using specific features of new hardware. * **Technical Insights:** Explanations of new technologies, standards, and components relevant to the embedded space. **Why is it valuable?** * **In-depth Technical Coverage:** They provide real technical specifications and often test actual performance, making it a reliable source for engineers and developers. * **Early Adopter Focus:** They are often among the first to report on new embedded hardware coming to market, especially from lesser-known Chinese manufacturers who are prominent in this space. * **Specialized Niche:** Their specific focus on embedded Linux/Android and open-source hardware makes them an authority in this particular segment of the tech world. * **Resource for Developers:** It helps developers and makers stay informed about new tools, platforms, and solutions for their projects. In summary, for anyone involved in or curious about the world of embedded systems, open-source hardware, and low-power computing, CNX Software is an indispensable resource. >>> |
I like to ask about CNX Software, because most local LLMs will spew out garbage, while online LLMs will usually provide relatively accurate information.
Local voice agent using Piper TTS, Vosk STT, and Ollama with Llama3.1:8b
So far, so good, but connecting to a local or online LLM does not use the Fusion HAT+ board features. Let’s work on that by trying a local voice chatbot demo. First, let’s check out the source code to change the language for the Vosk SST and Piper TTS if not US English, and set the proper parameters for Ollama:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
# Initialize speech recognition stt = Vosk(language="en-us") # Initialize TTS tts = Piper() tts.set_model("en_US-amy-low") # Instructions for the LLM INSTRUCTIONS = ( "You are a helpful assistant. Answer directly in plain English. " "Do NOT include any hidden thinking, analysis, or tags like <think>." ) WELCOME = "Hello! I'm your voice chatbot. Speak when you're ready." # Initialize Ollama connection llm = Ollama(ip="192.168.31.12", model="llama3.1:8b") llm.set_max_messages(20) llm.set_instructions(INSTRUCTIONS) |
It worked pretty well, although it can get comical with the script hearing “huh” sounds, possibly because of the fan on the UP Xtreme i11 Edge mini PC:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
pi@raspberrypi:~/fusion-hat/examples $ sudo python3 local_voice_chatbot.py 2026-01-25 19:17:33.762952537 [W:onnxruntime:Default, device_discovery.cc:164 DiscoverDevicesForPlatform] GPU device discovery failed: device_discovery.cc:89 ReadFileContents Failed to open file: "/sys/class/drm/card1/device/vendor" Hello! I'm your voice chatbot. Speak when you're ready. 🎤 Listening... (Press Ctrl+C to stop) [YOU] why is the sky blue The sky appears blue because of something called Rayleigh scattering. When sunlight enters Earth's atmosphere, it encounters tiny molecules of gases such as nitrogen and oxygen. These molecules scatter the light in all directions, but they scatter shorter (blue) wavelengths more than longer (red) wavelengths. This is why we see the sky appear blue during the day - our eyes are seeing the scattered blue light that's bouncing around everywhere. It's kind of like when you shine a flashlight through a prism and it breaks into different colors; with sunlight, the entire spectrum is being scattered, but our eyes pick up on the blue part. At sunrise and sunset, the sky can appear more red or orange because the sun's light has to travel through more of the atmosphere to reach us, scattering away some of the shorter wavelengths. 🎤 Listening... (Press Ctrl+C to stop) [YOU] huh That's a pretty normal reaction when someone first learns about it! It might seem a bit weird that the sky is blue and not just gray or white. But it's actually quite cool (and scientifically interesting) how light behaves in our atmosphere. If you want to know more, I can try explaining it in simpler terms or give you some analogies to help make sense of it! 🎤 Listening... (Press Ctrl+C to stop) [YOU] huh Let me try again with a different explanation. Imagine you're holding a big jar of glitter and shining a flashlight through it. The light scatters off the tiny particles of glitter, making it look like the entire jar is filled with sparkling colors. That's kind of what's happening with the sky when sunlight enters our atmosphere. The tiny molecules in the air are like the glitter, scattering the blue light in all directions and making the sky appear blue to us. Does that make more sense? 🎤 Listening... (Press Ctrl+C to stop) [YOU] it does thank you I'm glad I could help explain it in a way that made sense to you. If you have any other questions or need help with anything else, feel free to ask! 🎤 Listening... (Press Ctrl+C to stop) |
The application will continue listening, and sometimes hears “huh” even when nobody is in the room:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
🎤 Listening... (Press Ctrl+C to stop) [YOU] huh Still with the huh's I see! Okay, okay, I get it. Maybe we're just done for real this time. Bye! 🎤 Listening... (Press Ctrl+C to stop) [YOU] huh I'm leaving. Seriously this time. Have a great day! 🎤 Listening... (Press Ctrl+C to stop) [YOU] huh I'll just close the conversation now. Goodbye! 🎤 Listening... (Press Ctrl+C to stop) [YOU] huh Goodbye again. (This is really it, I promise) 🎤 Listening... (Press Ctrl+C to stop) [YOU] huh FINAL GOODBYE. 🎤 Listening... (Press Ctrl+C to stop) [YOU] huh I'll just leave this conversation now and come back when you're ready to talk. Have a nice day! 🎤 Listening... (Press Ctrl+C to stop) [YOU] huh Goodbye, I've left. 🎤 Listening... (Press Ctrl+C to stop) [YOU] huh Conversation ended. 🎤 Listening... (Press Ctrl+C to stop) [YOU] huh I'm out of here. 🎤 Listening... (Press Ctrl+C to stop) [YOU] huh *ended* 🎤 Listening... (Press Ctrl+C to stop) [YOU] huh This conversation has been closed. 🎤 Listening... (Press Ctrl+C to stop) [YOU] huh Goodbye! 🎤 Listening... (Press Ctrl+C to stop) |
AI voice assistant with Google Gemini
Let’s do something even more advanced by converting our Raspberry Pi 5 into what amounts to a smart speaker using the AI voice assistant demo. Again, I relied on Google Gemini, and edit the voice_assistant.py script with proper parameters:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
from fusion_hat.voice_assistant import VoiceAssistant from fusion_hat.llm import Gemini as LLM from secret import GEMINI_API_KEY as API_KEY llm = LLM( api_key={API_KEY}, model="gemini-2.5-flash", ) # Robot name NAME = "Buddy" # Enable image, need to set up a multimodal language model WITH_IMAGE = True # Set models and languages LLM_MODEL = "gemini-2.5-flash" TTS_MODEL = "en_US-amy-low" STT_LANGUAGE = "en-us" |
If you don’t have a camera connected, you’ll also want to set WITH_IMAGE to False.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
pi@raspberrypi:~/fusion-hat/examples $ sudo python3 voice_assistant.py 2026-01-31 14:36:12.874392721 [W:onnxruntime:Default, device_discovery.cc:164 DiscoverDevicesForPlatform] GPU device discovery failed: device_discovery.cc:89 ReadFileContents Failed to open file: "/sys/class/drm/card1/device/vendor" [0:05:28.231334914] [3337] INFO Camera camera_manager.cpp:340 libcamera v0.6.0+rpt20251202 [0:05:28.240260201] [3863] INFO RPI pisp.cpp:720 libpisp version 1.3.0 [0:05:28.242815656] [3863] INFO IPAProxy ipa_proxy.cpp:180 Using tuning file /usr/share/libcamera/ipa/rpi/pisp/imx708.json [0:05:28.250784150] [3863] INFO Camera camera_manager.cpp:223 Adding camera '/base/axi/pcie@1000120000/rp1/i2c@88000/imx708@1a' for pipeline handler rpi/pisp [0:05:28.250828205] [3863] INFO RPI pisp.cpp:1181 Registered camera /base/axi/pcie@1000120000/rp1/i2c@88000/imx708@1a to CFE device /dev/media0 and ISP device /dev/media1 using PiSP variant BCM2712_D0 [0:05:28.254004103] [3337] INFO Camera camera.cpp:1215 configuring streams: (0) 640x480-XBGR8888/sRGB (1) 1536x864-BGGR_PISP_COMP1/RAW [0:05:28.254121621] [3863] INFO RPI pisp.cpp:1485 Sensor: /base/axi/pcie@1000120000/rp1/i2c@88000/imx708@1a - Selected sensor format: 1536x864-SBGGR10_1X10/RAW - Selected CFE format: 1536x864-PC1B/RAW >>> heard: hey buddy Waked, Listening ... heard: what is the sky blue The sky appears blue because of something called **Rayleigh scattering**. Here's a simplified explanation: 1. **Sunlight is made of all colors:** The light from the sun looks white, but it's actually a mix of all the colors of the rainbow (red, orange, yellow, green, blue, indigo, violet). 2. **Earth's atmosphere:** Our atmosphere is made up of tiny gas molecules (like nitrogen and oxygen) and small particles. 3. **Scattering light:** When sunlight enters the atmosphere, these tiny molecules and particles scatter the different colors of light in different directions. 4. **Blue light scatters more:** Blue and violet light have shorter wavelengths than red and yellow light. Shorter wavelengths are scattered *much more effectively* by the tiny gas molecules in the atmosphere. 5. **Why blue, not violet?** While violet light scatters even more than blue, our eyes are more sensitive to blue light. Also, some of the violet light is absorbed higher in the atmosphere. So, the dominant color we perceive is blue. So, when you look up, you're seeing blue light that has been scattered across the sky and toward your eyes from all directions. >>> |
Since I had the Raspberry Pi Camera Module 3 connected to the board, I also asked it to use the camera:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
heard: hey buddy Waked, Listening ... heard: can you use the camera and what do you see Okay, using a camera on this image, here's what I "see" and can describe: The image appears to be taken indoors and is rotated 90 degrees clockwise. **Assuming the image is rotated upright for description:** * **Left Side (Wall and Window):** * The upper half of the left side is dominated by a plain, light-colored (off-white or pale gray) wall with some minor smudges or imperfections. * A dark brown horizontal band separates the wall from the window structure below it. * Below the brown band is a window. It has a light-colored (likely aluminum or white) frame and appears to have a screen. Through the window, faint vertical bars are visible, suggesting security bars or a fence outside. The view outside is bright but indistinct. * **Right Side (Table/Surface and Objects):** * The entire right side features a light brown surface with a distinct wavy wood grain pattern, resembling a table or shelf. There's a small, dark circular mark (possibly a screw hole or stain) on this surface towards the center. * Stacked on this wooden surface are two boxes: * The **top box** is dark gray or black. Some faint white text is visible on its side, but it's largely illegible, appearing to be technical specifications or branding (e.g., "ROHS © FC" and other smaller text, which is reversed due to the image rotation). * Below it is a **white rectangular box**. On its visible side, there is clear blue text that reads "**Sunfounder**" (this text is also reversed due to the image rotation, but easily deciphered). Below "Sunfounder," smaller text indicates: * **CN0522D** * **Made in China** * A date: **2023.11.26** * Part of a dark blue or dark green fabric item (possibly a bag or clothing) is visible next to and beneath the white box. * Further to the right, a crumpled white plastic bag or sheet is partially visible. In summary, it's a photograph of a room interior, showing a wall, a window, and a wooden surface with stacked boxes and other items on it. |
Pretty neat! No video here, as I’ll showcase that part in another demo using a servo in the section below. Note that sometimes, I had no audio at all, but I noticed running the command:
|
1 |
sudo python3 tts_pico2wave.py |
fixes the audio issue in the voice_assistant.py sample. SunFounder and I are not sure why, and I cannot reproduce it with 100% certainty.
Servo control with TTS
The Fusion HAT+ documentation focuses on audio and LLM/LVM control, but SunFounder is also working on an “AI Fusion Lab Kit” that will ship with the Fusion HAT+, as well as motors, servos, and so on. We can check the documentation for the upcoming kit for instructions on how to control a servo. I connected an SG90 servo to port 0 as shown in the photo below, and added a Thai flag to it for dramatic effect 😉
Let’s check out the sample code for the AI Lab kit:
|
1 2 3 |
cd ~ git clone https://github.com/sunfounder/ai-lab-kit.git --depth 1 cd ai-lab-kit/python/ |
and run the servo demo:
|
1 |
python3 1.7_Servo.py |
The servo will rotate in an infinite loop. We’ve got the low-level stuff sorted out. See relevant Python code:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
from fusion_hat.servo import Servo # Import the Servo class for controlling servos from time import sleep # Import sleep for timing delays servo = Servo(0) # Create a Servo object on channel 0 while True: # Loop forever # Sweep from -90° to +90° in steps of 10° for i in range(-90, 91, 10): servo.angle(i) # Set servo to angle i sleep(0.1) # Small delay for smooth movement # Sweep back from +90° to -90° in steps of -10° for i in range(90, -91, -10): servo.angle(i) # Set servo to angle i sleep(0.1) # Small delay for smooth movement |
Let’s use that to mix TTS, Vosk STT, Google Gemini LLM, and servo control. The goal here is to rotate the servo when I say “wave the flag”, and otherwise send the request to Google Gemini. Note that I switched from gemini-2.5-flash to gemini-2.5-flash-lite since I ran out of the 20 daily credits during testing. Another way would be to set up billing and buy some credits.
Here’s the code for the demo:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 |
from fusion_hat.voice_assistant import VoiceAssistant from fusion_hat.llm import Gemini from fusion_hat.servo import Servo from secret import GEMINI_API_KEY import time # Robot name NAME = "Buddy" # Enable image, need to set up a multimodal language model WITH_IMAGE = True # Set models and languages LLM_MODEL = "gemini-2.5-flash-lite" TTS_MODEL = "en_US-ryan-low" STT_LANGUAGE = "en-us" # Enable keyboard input KEYBOARD_ENABLE = True # Enable wake word WAKE_ENABLE = True WAKE_WORD = [f"hey {NAME.lower()}"] # Set wake word answer, set empty to disable ANSWER_ON_WAKE = "Hi there" # Welcome message WELCOME = f"Hi, I'm {NAME}. Wake me up with: " + ", ".join(WAKE_WORD) # Set instructions INSTRUCTIONS = f""" You are {NAME}, a helpful and friendly voice assistant. Keep answers short, natural, and fun. If the user says anything about waving a flag, only reply "Done" and nothing else. It's handled by the hardware. """ class FlagWavingAssistant(VoiceAssistant): def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) self.servo = Servo(0) self.answer_on_wake = "" def wave_flag(self): # Wave the flag back and forth 3 times self.tts.say("Waving the flag right now!") for _ in range(3): self.servo.angle(-45) time.sleep(0.5) self.servo.angle(45) time.sleep(0.5) self.servo.angle(0) # Reset to center def on_heard(self, text: str): text_lower = text.lower().strip() if "wave the flag" in text_lower or "wave flag" in text_lower: self.wave_flag() return # ──────────────────────────────────────────────── # Main setup # ──────────────────────────────────────────────── # Initialize Gemini LLM llm = Gemini( api_key=GEMINI_API_KEY, model="gemini-2.5-flash-lite", ) # Create your custom assistant va = FlagWavingAssistant( llm, name=NAME, with_image=WITH_IMAGE, tts_model=TTS_MODEL, stt_language=STT_LANGUAGE, keyboard_enable=KEYBOARD_ENABLE, wake_enable=WAKE_ENABLE, wake_word=WAKE_WORD, answer_on_wake=ANSWER_ON_WAKE, welcome=WELCOME, instructions=INSTRUCTIONS, ) if __name__ == "__main__": try: va.run() # Starts the full loop: wake → listen → think → speak except KeyboardInterrupt: va.say("Goodbye!") print("Stopped by user.") |
The demo below demonstrates wake word support, Gemini LLM with a general question and a description from a photo taken with the Raspberry Pi Camera Module 3, and finally waving the flag with a voice command using TTS.
Output from the command:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
pi@raspberrypi:~/fusion-hat/examples $ sudo python3 voice_assistant_servo.py 2026-02-01 11:10:56.476444466 [W:onnxruntime:Default, device_discovery.cc:164 DiscoverDevicesForPlatform] GPU device discovery failed: device_discovery.cc:89 ReadFileContents Failed to open file: "/sys/class/drm/card1/device/vendor" [0:19:39.838079874] [3034] INFO Camera camera_manager.cpp:340 libcamera v0.6.0+rpt20251202 [0:19:39.851522012] [3560] INFO RPI pisp.cpp:720 libpisp version 1.3.0 [0:19:39.867269263] [3560] INFO IPAProxy ipa_proxy.cpp:180 Using tuning file /usr/share/libcamera/ipa/rpi/pisp/imx708.json [0:19:39.878685104] [3560] INFO Camera camera_manager.cpp:223 Adding camera '/base/axi/pcie@1000120000/rp1/i2c@88000/imx708@1a' for pipeline handler rpi/pisp [0:19:39.878735845] [3560] INFO RPI pisp.cpp:1181 Registered camera /base/axi/pcie@1000120000/rp1/i2c@88000/imx708@1a to CFE device /dev/media2 and ISP device /dev/media0 using PiSP variant BCM2712_D0 [0:19:39.882767570] [3034] INFO Camera camera.cpp:1215 configuring streams: (0) 640x480-XBGR8888/sRGB (1) 1536x864-BGGR_PISP_COMP1/RAW [0:19:39.882904718] [3560] INFO RPI pisp.cpp:1485 Sensor: /base/axi/pcie@1000120000/rp1/i2c@88000/imx708@1a - Selected sensor format: 1536x864-SBGGR10_1X10/RAW - Selected CFE format: 1536x864-PC1B/RAW >>> heard: hey buddy Waked, Listening ... heard: why is the sky blue The sky is blue because of the way the Earth's atmosphere scatters sunlight! Blue light waves are shorter, so they scatter more than other colors, making the sky appear blue. >>> heard: hey buddy Waked, Listening ... heard: take a photo on describe what you see I see a stuffed character, maybe a mascot, wearing a white hat with a red pompom, and a red cape or scarf. It looks like it's outdoors, possibly at an event or amusement park, with wooden structures and greenery in the background. The photo is taken from a low angle, looking up at the character. >>> heard: hey buddy Waked, Listening ... heard: wave the flag Done |
Photo from Raspberry Pi Camera Module 3:
My first try with the Gemini 2.5 Flash model was a bit more verbose and would have likely described the Fusion HAT+ for Raspberry Pi package as well. Switching to the Gemini 2.5 Flash Lite model generates shorter answers. I also removed the “Keep answers short, natural, and fun” text from the instructions, and I still got relatively short answers from the Lite model.
It also works on battery, so users can disconnect the USB cable to keep using it on the go for a few hours. The power button that allows a safe shutdown is also very useful, considering most people will run it headless (without a display).
Conclusion
I had fun playing with the Fusion HAT+ connected to a Raspberry Pi 5, as SunFounder provides excellent documentation to get started with text-to-speech (TTS), speech-to-text (STT), local LLM/VLM with Ollama, online LLMs such as Google Gemini, and also to create your own voice assistant.
It also offers motor, servos, and other I/Os interfaces, along with a battery, which makes it especially useful for voice-controlled robots. Having said that, it could also be integrated into a Raspberry Pi-based Smart Home controller, or used for education, especially since the price tag is rather competitive. The motor/servo/IO control documentation is rather limited for the Fusion HAT+ itself, but I could rely on the documentation for the upcoming SunFounder AI Fusion Lab Kit that is much more thorough in that regard, and will ship with accessories to create all sorts of fun demos and projects.
I’d like to thank SunFounder for sending the Fusion HAT+ for review. It is currently sold for $34.99 on AliExpress, Amazon, and the company’s online store, including a battery pack.

Jean-Luc started CNX Software in 2010 as a part-time endeavor, before quitting his job as a software engineering manager, and starting to write daily news, and reviews full time later in 2011.
Support CNX Software! Donate via cryptocurrencies, become a Patron on Patreon, or purchase goods on Amazon or Aliexpress. We also use affiliate links in articles to earn commissions if you make a purchase after clicking on those links.
















