Raspberry Pi AI HAT+ 2 review – A 40 TOPS AI accelerator tested with Computer Vision, LLM, and VLM workloads

Raspberry Pi sent me a sample of their AI HAT+ 2 generative AI accelerator based on Hailo-10H for review. The 40 TOPS AI accelerator is advertised as being suitable for LLMs (Large Language Models) and VLM (Vision Language Models), while delivering about the same performance as the first-generation AI HAT+ (Hailo-8) for AI vision/computer vision models.

After going through an unboxing, I’ll assemble the AI HAT+ 2 to a Raspberry Pi 5 with 2GB of RAM fitted with a Raspberry Pi Camera Module 3, before quickly checking whether AI vision models work as expected, and spending more time on testing LLM and VLM samples.

Raspberry Pi AI HAT+ 2 unboxing

My sample had a somewhat long and rough trip from the UK to Thailand, and the package did not look that good when DHL delivered it.

Raspberry Pi AI HAT+ 2 package DHL

But luckily, nothing was damaged, and I got the AI HAT+ 2 with a heatsink, a 40-pin GPIO extension header, plastic standoffs and screws, and a sheet explaining how to assemble the heatsink.

Raspberry Pi AI HAT+ 2 unboxing
The Raspberry Pi AI HAT+ 2 has two main ICs: the Hailo-10H AI accelerator and an 8GB memory chip, which should allow us to run LLMs and VLMs on a Raspberry Pi 5 with limited memory. This differs from the Hailo-8-based AI HAT+ that relies on the memory on the Raspberry Pi 5, and is only suitable for computer vision applications.

Hailo 10H HAT
There’s nothing much on the bottom apart from passive components and the PCIe flat cable.

Raspberry Pi AI HAT+ 2 bottom

Raspberry Pi AI HAT+ 2 assembly with Raspberry Pi 5

The assembly is pretty straightforward. First, I secured the heatsink on top of the HAT after peeling the protective film and pressing the two spring clips attached to the heatsink. I also installed four plastic standoffs on the Raspberry Pi 5 and inserted the GPIO extension header.

Raspberry Pi AI HAT+ 2 installation
At that point, you’ll likely want to insert the PCIe flat cable into the 16-pin PCIe FFC connector of the Pi 5, before placing the AI HAT+ 2 on top and securing it with the four remaining screws.


Raspberry Pi 5 Raspberry Pi AI HAT+ 2

If you plan to use another HAT+ on top, you’ll have to make sure you don’t fully push the GPIO extension header since the GPIO pins won’t be accessible otherwise.

Raspberry Pi AI HAT+ 2 installation PCIe cable
Note that we can’t use an NVMe SSD with the AI HAT+ 2 unless you add another HAT with a PCIe switch like the HatBRICK! Commander.

Install Raspberry Pi OS Trixie 64-bit and Hailo package

My board was still running Raspberry Pi OS Bookworm based on Debian 12, but the AI HAT+ 2 requires the latest Raspberry Pi OS Trixie 64-bit. So I removed the 32GB microSD card to flash it with the latest OS. If it is already installed on your board, you may want to make sure it’s up-to-date:


We can now install the package required by the Hailo-10H accelerator and reboot:


We can confirm the HAILO10H accelerator is properly detected:

Computer Vision samples with rpicam-apps

As I understand, the Hailo-10H has no benefit over the Hailo-8 when it comes to computer vision processing, but it’s still important that those work. So I installed rpicam-apps to repeat at least one of the tests I did with the Raspberry Pi AI HAT+.


As mentioned in the introduction, I also connected a Raspberry Pi Camera Module 3 to the single board computer, so I can run the YoloV8 model on the Hailo-10H HAT:


Raspberry Pi AI HAT+ 2 Hailo-10H Computer Vision Review

The HAILO10 device is detected, and everything works smoothly as it did with the Hailo-8, and at a higher FPS than just relying on the Raspberry Pi 5. I won’t spend time testing other CV samples, and instead focus on LLMs and VLMs for the rest of the review.

Testing LLMs on the Raspberry Pi AI HAT+ 2

Running LLMs on the command line with Hailo Ollama server

We’ll mostly follow the instructions posted on the Raspberry Pi website.  The first step is to install the Hailo Ollama server (version 5.1.1):


Let’s start the server in one Terminal window:


It’s running on port 8000 and server that exposes a REST API for model inference. We can now open another Terminal window to list the available models:


That would be five models. Let’s start by downloading the DeepSeek model:


We can send a request to translate some text from English to French:


After some wait, it will start to output data:


It works, although the translation is not quite accurate…  It generates 186 tokens in 28.6 seconds for 186 tokens, or about 6.5 tokens/s. We can download and play around with other models in the command, but I won’t go into detail for each here. These rather small models with 1.5 to 3 billion parameters are fine for testing, but most people will probably build custom models optimized for their application.

Running models in a web browser with Open WebUI

Instead, I’m going to show how to run the models in a web browser. As we’ve seen in our review of the UP Squared Pro TWL AI Dev Kit with an Hailo-8 AI accelerator, the Hailo SDK is very finicky with the Python version used. Since the Python version in Raspberry Pi OS Trixie is not compatible, we’ll use Docker instead.

You may first want to remove old Docker packages (skip if you’ve just installed Raspberry Pi OS Trixie):


Now add the Docker PGP key:


Create the file /etc/apt/sources.list.d/docker.sources as a superuser with:


Install and run Docker:


Add the current user to the docker group:


Exit the terminal, and log in again to test Docker:


Output:


All good. Now we can install and use Open WebUI, while the hailo-ollama server is running in another Terminal window:


It will take a little while to start, but then you can access Open WebUI using http://localhost:8080 on the Raspberry Pi, or http://:8080 from any computer on the LAN. I used my laptop to access the web interface.

Open WebUI Raspberry Pi 5

Clicking on Get Started will bring you to an interface to create an admin account. I entered a valid email address, but it’s not even needed, as I never received an email.
Open WebUI registrationAfter that, you’ll be brought to the Open WenUI dashboard, where you can select one of the models and chat with it.

Open WebUI Raspberry Pi deepseek r1

I used the same request as in the command line: “Translate to French: The cat is on the table.”
Open WebUI DeepSeek R1 Distill Qwen Raspberry Pi 5 AI Hat+ 2

There’s no benchmark data, and I tried to install the Time Token Tracker and the Chat Metrics functions to get a tokens/s value, but neither worked for me.

Benchmarking LLM performance of the AI HAT+ 2

One way to get performance data is to hover over the Information icon.

Open WebUI tokens duration

Here, we can get “total_tokens” and “total_duration” values to calculate the performance, but it’s not ideal. So instead, I did that in the command line with the jq utility.

Here are the results for the English to French translation requests

DeepSeek R1 1.5B:


Qwen2 1.5B:


Llama3.2 3B:


Qwen 2.5 instruct 1.5B:


I also benchmarked Qwen 2.5 Coder 1.5B, asking it to write a function in Python:


Some answers take over one minute, and it is possible to make the answer faster by limiting the number of tokens:


Note that all these little tests take some space:


I used a 32GB microSD card, but a 64GB microSD card would give more breathing room. If you need to delete models to save space, you’ll find them in /usr/share/hailo-ollama/models/blob/:


At that point, I was convinced I had used a Raspberry Pi 5 8GB. I thought I would install ollama to run the models above on the CPU, but with the lack of space, and my realization I had used a Pi with 2GB only, I had to find another Raspberry Pi 5. I turn out the board I use now is my only one, and I don’t own the other models anymore. So instead, I ran the same tests on the Raspberry Pi development kit for CM5 using the Broadcom BCM2712 CPU cores, 4GB RAM, and a 32GB eMMC flash..

The first step is to install ollama and check that it’s running fine.


I can now run ollamo with deepseek-r1:1.5b, and the verbose parameters to enable metrics:


An eval rate of 9.04 tokens/s is quite higher than the 6.7 tokens/s reported using the Hailo-10H accelerator. That’s disappointing. Nevertheless, I’ve repeated all tests as follows:

Qwen2:1.5b:


llama 3.2:3b:


qwen2.5:1.5b-instruct:


qwen2.5-coder:1.5b:


It should be noted that the results on the Hailo-10H might include the prompt evaluation time (usually quite fast compared to generation) on top of the answer time, so actual tokens/s might be slightly higher than reported. Nevertheless, the table below gives us an idea of the LLM performance of the Raspberry Pi AI HAT+ 2 against the Raspberry Pi 5/CM5 CPU.

Raspberry Pi 5/CM5 CPURaspberry Pi 5 + AI HAT+ 2
Deepseek R1 1.5B9.04 tokens/s6.72 tokens/s
Qwen2 1.5B11.29 tokens/s5.89 tokens/s
Llama3.2 3B4.78 tokens/s2.60 tokens/s
Qwen 2.5 instruct 1.5B11.73 tokens/s6.74 tokens/s
Qwen 2.5 Coder 1.5B10.26 tokens/s8.06 tokens/s

Just looking at this table, the Raspberry Pi AI HAT+ 2 feels more like an AI decelerator than an AI accelerator! I was expecting the Raspberry Pi AI HAT+ 2 and Hailo-10H AI accelerator to compete against the Rockchip RK1820/RK1828 AI accelerators, but those are in a different league, as shown in the table below (and the price will probably be a few hundred dollars).

RK1820 RK1828 LLM benchmarks
RK1820/RK1828 LLM and VLM benchmarks

However, there are still some benefits of using the Hailo-10H board for LLMs. First, the host does not need to have a lot of RAM, and I could run all models on a Raspberry Pi 5 2GB, since models are loaded into the 8GB RAM chip on the HAT.

Besides RAM usage, the CPU does not do anything, as shown in the screenshot with htop below when running a DeepSeek R1 1.5B request.

HTOP Raspberry Pi 5 AI HAT+ 2It looks like it could even run on one of the new Raspberry Pi 5 1GB RAM boards. Now compare this to running the same task with ollama on the Raspberry Pi developer kit for CM5.

HTOP Raspberry Pi 5 CPU ollama deepseek r1

There’s high CPU usage, and the memory used is much higher since the model is loaded in the Pi’s RAM, and these resources could be used for other tasks on the Raspberry Pi 5. In theory, you could also offload your LLM to another Raspberry Pi 5 8GB SBC with better performance and a cheaper price tag than the AI HAT+ 2. However, the solution would be larger/heavier and consume more power, which may be important for battery-powered robots, for instance.

  • DeepSeek-R1 1.5B Ollama on Raspberry Pi CM5 devkit: 10.2-10.6 Watts.
  • DeepSeek-R1 1.5B Hailo-ollama on Raspberry Pi 5 2GB with AI HAT+ 2:  7.2 to 7.6Watts.

I also contacted Eben Upton about the results, and he explained that it’s not surprising that the tokens-per-second figure is similar, as this is memory-bandwidth limited, and Raspberry Pi 5 and Hailo-10 have similar memory subsystems (LPDDR4X-4267). He also replied that the key goals of the AI HAT+ 2 product were:

  • To deliver a much improved time-to-first-token
  • To unload the host Raspberry Pi CPU (and memory) to execute other tasks

I don’t see any obvious data/parameters in the Hailo tools to check the time-to-first-token (TTFT), so I haven’t tested it. Having said that, we should expect some TTFT benchmarks from Raspberry Pi in the near future, as long as larger models for the AI HAT+ 2.

VLM (Vision Language Model) testing with the Raspberry Pi AI HAT+ 2

We’re not done yet, as we haven’t tested VLMs (Vision Language Models), which might be the optimal workloads for the Hailo-10H since they leverage both its computer vision and large language model capabilities. We can test this using the Hailo apps:


This will take a while, and if it ends successfully, it should show something like:


I had to run it twice because my microSD card was full, and I deleted the LLMs to free some space in order to complete the installation.

Let’s start the VLM chat demo that relies on a Raspberry Pi camera module to capture an image and the AI accelerator to describe it.


After a while, a window with the camera output will show up, and we can press Enter to capture an image, and then we can ask a question about it, or press Enter to use the default prompt (“Describe the image”).

Raspberry Pi AI HAT+ 2 review VLM chat test

Here’s one of the outputs I got:


The camera colors are off, and there doesn’t seem to be any parameter to fix that:


I did try the show-fps option, but nothing showed up.

hailo-10h vlm chat demo Raspberry Pi Camera Module 3

Conclusion

The Raspberry Pi AI HAT+ 2 did not quite meet my expectations as an LLM accelerator, as I wrongly assumed that it would speed up performance in terms of tokens/s. Instead, it delivers Computer Vision performance similar to the Hailo-8-based Raspberry Pi AI HAT+ launched a couple of years ago, and adds support for LLMs and VLMs.

The performance of Large Language Models is actually somewhat lower than running these on the Broadcom BCM2712 CPU found on the Raspberry Pi 5, and the main benefits of the AI HAT+ 2 here are that it offloads processing using very little RAM and CPU from the SBC itself, and consumes less power, which may be important for battery-powered applications. I used it with a Raspberry Pi 5 2GB, but it should also work with a Raspberry Pi 5 1GB, thanks to the built-in 8GB RAM found on the HAT itself. The Time to First Token (TTFT) should also be much faster, but I didn’t find a way to test this with the provided tools. The best use case for the AI HAT+2 is probably Vision Language Models since we can leverage Computer Vision and Large Models capabilities of the Hailo-10H AI accelerator.

The Raspberry Pi AI HAT+ 2 sells for $130 compared to $110 for the first-generation Raspberry Pi AI HAT+. Since both have about the same AI vision performance, and the main benefits of the AI HAT+ 2 over running LLM on the Pi 5 directly are offloading (since Hailo-10H and CPU performance are about the same) and lower power consumption, I think the new Hailo-10H HAT+ is probably best suited to security camera systems like Frigate or robots equipped with cameras that need to understand their environments through VLMs.

Share this:

Support CNX Software! Donate via cryptocurrencies, become a Patron on Patreon, or purchase goods on Amazon or Aliexpress. We also use affiliate links in articles to earn commissions if you make a purchase after clicking on those links.

Radxa Orion O6 Armv9 mini-ITX motherboard
Subscribe
Notify of
guest
The comment form collects your name, email and content to allow us keep track of the comments placed on the website. Please read and accept our website Terms and Privacy Policy to post a comment.
6 Comments
oldest
newest
Boardcon MINI1126B-P AI vision system-on-module wit Rockchip RV1126B-P SoC