The Ubo Pod Developer Edition (DE) is an open-source AI vision and conversational voice assistant platform built around the Raspberry Pi 4 or 5, and designed for developers who want more control over their AI experiences.
The device aims to replace black boxes like Amazon Echo or Google Next AI assistants, with an open hardware smart speaker running open-source software, and offering features such as speech-to-text, LLMs/VLMs, text-to-speech, tool calling, and various multiple trigger mechanisms, among others. The Ubo Pod supports both cloud-based and fully local private AI, and features an embedded GUI on the integrated display and a WebUI for no-code setup.
Ubo Pod specifications:
- SBC
- Ubo Pro 4 – Raspberry Pi 4
- Ubo Pro 5 – Raspberry Pi 5
- Storage
- 32GB MicroSD card included and preloaded with OS.
- Ubo Pro 5 – M.2 PCIe socket for NVMe SSD (or AI accelerator)
- Display – 1.54-inch color TFT IPS display with 240 x 240 resolution
- Camera – Built-in 5MP camera; Ubo Pro 4 supports the official Raspberry Pi camera modules V1 and V2, and Ubo Pro 5 also supports the Raspberry Pi Camera Module 3.
- Audio
- Dual stereo microphones up to 48 KHz audio recording
- Stereo speakers, up to 48 KHz, <0.1% THD with 1 Watt per channel
- Line Out – 40 mW output power into 16 Ohm at 3.3 VDC, THD -75 dB at 20 mW, SNR 90 dB with 16 Ohm load; jack insertion detection (WM8960)
- Networking (on Raspberry Pi SBC)
- Gigabit Ethernet RJ45 port
- WiFi 5 and Bluetooth 5.0
- USB (on Raspberry Pi SBC) – 2x USB 3.0 ports, 2x USB 2.0 ports
- Sensors
- Temperature sensor with ±1 °C accuracy over -25 °C to +100 °C range (PCT2075 by NXP)
- Ambient Light sensor – 0 to 120 Kilolux [klx] with resolution down to 0.0036 lx/ct (VEML7700)
- Misc
- Power button
- 7x buttons on a soft-touch silicone keypad
- LED ring comprised of 27x individually controllable RGB LEDs (SK6812), NeoPixel compatible
- IR receiver up to 5m range (TSOP75238)
- IR transmitters (940 nm), omnidirectional, with high radiant power and speed (4x VSMB10940)
- Physical privacy curtain for the camera and a hardware disconnect switch for microphones
- Active thermal management
- Dimensions – 130 x 99 x 52 mm
- Weight – 340 grams

On the software front, the system runs Raspberry Pi OS with the Ubo App (Python) available on GitHub, and you don’t need to get a Ubo Pod to test it out, since it can be installed on Raspberry Pi 4/5 SBCs.
Ubo Pod supports over 50 AI service providers for speech-to-text, text-to-speech, memory, vision, and LLMs, as well as local on-device/on-premise options. You can select the provided via the built-in display or the web interface. A gRPC API to let developers add features in just a few lines of code. Practical examples include the Memo voice assistant with conversation memory/context retention, image generation using voice input, image description with VLMs, controlling your TV through the built-in IR transmitter and voice commands, and more.
Besides the software, the hardware design is also open-source, and you’ll find GitHub repositories for the custom PCBs and mechanical files. The video embedded at the end of this post provides a quick overview of the system, but you’ll find a longer demo (30 minutes) for more details.
Ubo Technology launched its Raspberry Pi 4/5-based personal AI assistant on Kickstarter with a $25,000 funding target. Rewards start at $109 for the Ubo Pro 4 and $129 for the Ubo Pro 5, neither of which ships with a Raspberry Pi SBC, so you must add your own. Additional information may also be found on the project’s website.

Jean-Luc started CNX Software in 2010 as a part-time endeavor, before quitting his job as a software engineering manager, and starting to write daily news, and reviews full time later in 2011.
Support CNX Software! Donate via cryptocurrencies, become a Patron on Patreon, or purchase goods on Amazon or Aliexpress. We also use affiliate links in articles to earn commissions if you make a purchase after clicking on those links.









All the big talk about AI and this is just a raspberry pi case with a speaker? I don’t see what makes this worth 100$, but maybe that’s just me.
I was expecting some TPU or accelerator to augment the PI’s capability to do some AI.
AI is the current hot buzzword, like cloud was a decade ago. Every product is getting AI tacked on even if it just means it could run a tiny LLM with middling speed.
I think at this point my brain has started to ignore it completely, as it reached its peak with the new AMD stuff, “AMD Strix Halo AI Max+ 395” or whatever.
All the SBCs have to now put AI in the name or in the top of the marketing, and talk about fancy NPUs that take up precious silicon space and can run a one-off vendor demo at 20 tokens per second but is impossible to get working outside of that.
@john I forgot to mention one thing. If you are already running Ollama on a local machine, you can easily direct Ubo to use Ollama API endpoint on the local network instead of running it on the Pi. We are testing several PCIe AI accelerators right now and will be offering them as upgrade options down the road. I think it is a bit early for that but we are seeing more capable options becoming available.
Exactly. I’d also expect 4 or 6 microphones. Just like Alexa.
I’d also expect a specific AI accelerator because only then the software support will be good enough.
@markus Right now we have a 2-mic setup which works great. Surprisingly we can do a lot on the software side (audio pose processing) without requiring more mics. Also this product is modular and upgradable design and in the future you can just swap the top HAT to upgrade without needing to buy the full hardware. we will be offering all sub-components separately to enable developers to create derivative designs or easily repair their devices if they break something.
Hi john, I am the creator of this product and wanted to clarify a few things: This is hardware and software joint project and we have invested over 2 years on the software stack which is quite powerful and capable. You can check the github repo for the software here and learn more about its event driven and modular architecture: github.com/ubopod/ubo_app.
The software also has a WebUI that allows for no-code setup and data entry. Developers can use the GUI in their applications for apps needing a UI/UX.
In terms of the hardware capability, we pack a bunch of additional hardware peripherals beside audio. We have build-in GUI, on-board camera for vision applications, ambient light sensor (for auto-activating night mode), temperature sensor, wide-band Infrared receiver, omni-directional high power IR transmitter (to control other IR enabled devices). The hardware side is constantly evolving and thanks to its modular architecture, developers can partially upgrade their hardware as we introduce new capabilities (dedicated AI accelerator, etc) without the need to buy a new hardware.
Happy address any question you may have.
If want to use any Raspberry Pi or other Linux SBC for Home Assistant Voice control then check out Open Home Foundation experiment project for a Linux Voice Assistent https://github.com/OHF-Voice/linux-voice-assistant
Home Assistant Voice control is great but it is limited. We support over 50 cloud-base services out-of-the-box for various engines (STT, VLM/LLM, TTS, Image generation). You can also set up memory layers (with Mem0) for long-term memory. We also support MCP tool calling that can connect to Home Assistant MCP to control devices inside home assistant. The AI features are built on the top of pipecat-ai which is very modern and powerful orchestration framework: https://github.com/pipecat-ai/pipecat
Note that to clearly pick-up voices and properly recognize voice commands in a noisy room you normally need a far-field microphone array solution (like example XMOS XVF3800 or XU316) that implements onboard DSP audio processing and noise removal (voice clean-up) algorithms such as Interference Cancellation (IC), Acoustic Echo Cancellation (AEC), Noise Suppression (NS), and Automatic Gain Control, etc. and other audio post-processing algorithms before the audio is passed on the wake-word and Speech-to-Text pipelines.
@Hedda With recent innovations in deep learning, you can now do a lot of this in software. Please see my longer response below:
1) Software processing versus hardware process: Since we have a capable processor, we can take advantage of it instead of running the algorithms on hardware. If system is MCU based, then DSP audio processing must be definitely pursued. For example, we do AEC on software (add a tiny bit of delay but it is not noticeable) as well noise cancellation.
Regarding noise cancellation, DSP algorithms typically implement classic noise suppression/cancellation methods, however with the emergence of deep learning algorithms, you can do noise cancellation through a completely different paradigm. For example checkout Koala by Picovoice: https://picovoice.ai/platform/koala/.
Overlapping speech is however a much more challenging problem where most capable voice assistant devices still struggle with. This problem is typically addressed by source separation and filtering out other voices from a specific speaker (requires training on speaker voice).
2) Hardware cost: Right now we are pricing this product very competitively to lower the barrier to entry to developers. XMOS ICs are quite expensive and with recent China tariffs on top of that, we have to increase our MSRP by another $50~70 to account for such an increase in BOM. Whereas the software-only approach with 2-microphones only adds a one-time software engineering overhead.
3) Modular upgradable hardware: We designed the hardware to be completely modular to allow developers to partially upgrade their devices (instead of buying a completely new device). That means it will be compatible with Raspberry Pi 6, 7, etc. The top module which includes the audio functions can be swapped for new models down the road with a small incremental upgrade fee to take advantage of new capabilities.
4) Made for developers: Right now we are not marketing this device to a broader non-developer audience and it is more of a development kit and a playground to try various features we have on-board and using the software foundation to quickly build interactive experiences. Given that this is an open source design (hardware/software), it will be evolving over time with the participation of other developers both on the software and hardware side.