More Technical Details & Benchmarks about Nvidia Tegra K1 “Denver” 64-bit ARM SoC

Orange Pi Development Boards

The 32-bit version of Nvidia Tegra K1 have generally received good reviews in terms of performance, especially GPU performance, and the company has also provided good developer’s documentation and Linux support, including open source drivers for the Kepler GPU (GK20A) found in the SoC. But as initially announced, Tegra K1 with also get a 64-bit ARM version codenamed “Denver”, and Nvidia provided more details at Hotchips conference.

The 64-bit Tegra K1 will still feature a 192-core Kepler GPU, but replace the four ARM Cortex A15 cores found in the 32-bit version, by two ARMv8 “Project Denver” cores custom-designed by Nvidia. The multi-core performance of the dual core 64-bit Tegra K1 @ 2.5 GHz may end up being equivalent to the quad core 32-bit Tegra K1 @ 2.1 GHz, but the single core performance will be much better thanks to a  a 7-way superscalar microarchitecture (vs 3-way for Cortex A15), as well as 128KB L1 instruction cache, 64KB  L1 data cache, and a 2MB L2 cache.

To further improve performance, Nvidia implemented a new technique called “Dynamic Code Optimization” that optimized frequently used routines into “tuned microcode-equivalent routine”, and store then in a 128MB dedicated optimization cache in the main memory. The software is done by software the first time, as the optimization overhead is said to be outweighed by the performance gains due to optimized code. Dynamic Code Optimization works with all standard ARM-based applications, requiring no customization from developers, and without added power consumption versus other ARM mobile processors.

Adding new low latency power-state transitions (CC4 Cluster retention), extensive power-gating and dynamic voltage and clock scaling based on workloads, Nvidia claims their dual core 64-bit Tegra K1 processor will outperform existing quad and octa core processor on most mobile workload, and it should even rival mainstream PC-class CPUs with much lower power consumption. You can find some benchmark results below comparing Tegra K1 32-bit performance to Tegra K1 Denver, Celeron N2910 (Bay Trail), Apple A7, Qualcomm Krait-400, and Haswell Celeron 2955U.

Nvidia Tegra K1 64-bit Benchmarks Against Competition (Click to Enklarge)
Nvidia Tegra K1 64-bit Benchmarks Against Competition (Click to Enlarge)

Another good news is that Denver will be pin-to-pin compatible with the original Tegra K1, which should make it pretty easy for OEMs to upgrade their products. Nvidia is currently working on Android “L” the 64-bit Tegra K1, and products should be available by the end of the year. You can find more details on a white paper called “NVIDIA Charts Its Own Path to ARMv8

Support CNX Software - Donate via PayPal or become a Patron on Patreon

Leave a Reply

1 Comment threads
0 Thread replies
Most reacted comment
Hottest comment thread
1 Comment authors
davidlt Recent comment authors
newest oldest most voted
Notify of

Hopefully they will release a Jetson dev board this 64-bit and 4GB of RAM. That would be amazing. Then just to figure out how to boat up Fedora on it. It’s interesting how they increased IPC for frequent code and improved finding for ILP. They moved out out-of-order logic into software and added in-order pipeline. Software (firmware) optimizing software (re-ordering, register renaming, etc). All that details in white paper ($0.0).