Archive

Posts Tagged ‘AArch64’

64-bit ARM (Aarch64) Instructions Boost Performance by 15 to 30% Compared to 32-bit ARM (Aarch32) Instructions

March 1st, 2016 12 comments

Yesterday was quite an eventful day with the launch of two low cost 64-bit ARM development boards, namely Raspberry Pi 3 and ODROID-C2, and as usual there were some pretty interesting discussions related to the launch of the boards in the comments section. One of the subject that came is that while Raspberry Pi 3 board is using a 64-bit processor, the operating systems are still compiled with 32-bit instructions (Aarch32) and even optimized for ARMv6, and they intend to keep it that way according to Eben Upton interview:

Eben readily admits that not all the capabilities of the new parts are going to be used at launch, however. “Although it is a 64‑bit core, we’re using it as just a faster 32-bit core,” he reveals about the Pi 3’s central processing unit. “I can imagine there’d be some real benefits [to 64-bit code]. The downside is that you do really create a separate world. To access that benefit, you’d have to have two operating systems. I’m hoping that someone will come and demonstrate to me that this is a good idea. But there are some really compelling advantages to still being basically ARMv6, and because it’s [Cortex-]A53 it’s a really good 32‑bit processor.”

So the clear advantage of running ARMv6 32-bit code is that a single image can be used for all Raspberry Pi boards, while of they had to optimize code for each board, they’d have one image for Raspberry Pi (ARMv6), one for Raspberry Pi 2 (ARMv7), and a final one for Raspberry Pi 3 (ARMv8), and obviously that would require a lot of work behind the scene. In theory, there should be a performance advantage of running 64-bit ARM instructions, but the question is how much?

ARM brings some perspective to performance improvement in their presentation “ARMv8: Advantages for Android”  where they compare performance improvements of Aarch64 (64-bit ARM instructions) over  Aarch32 (32-bit ARM instructions) running benchmarks compiled with either instructions set on Juno development board.

Click to Enlarge

Click to Enlarge

The first charts show native (C/C++ code) performance is between 15% to about 20% faster in bionic benchmarks, and Antutu 5.0 single thread and multi-thread CPU tests.

Click to Enlarge

Click to Enlarge

The second chart shows ART (Java runtime) performance is also about 15% better with Aarch64 using Quadrant 2.0 CPU score, and close to 30% faster with Linpack multi-threaded benchmark.

Broadcom BCM2837 processor’s Cortex A53 cores are likely to be further impacted since they are running a code compiled for the older ARMv6, which is slower than ARMv7. Let’s take another fun example. Raspberry Pi 3 benchmarks released on MagPi reveal sysbench completes in 49.02 seconds for multi-threaded CPU test, and tkaiser, an active developer for armbian project, ran sysbench on Pine A64 development on Ubuntu 16.04 64-bit, and the results are quite surprising considered Allwinner A64 is also a quad core Cortex A53 processor @ 1.2 GHz:

So it took only 3.25 seconds on Pine A64 with ARMv8 instructions compared to 49.02 seconds on Raspberry Pi 3 with ARMv6 instructions, so it appears that if you are specifically looking for prime numbers it does pay big time (15 times faster) to switch to Aarch64 instructions. Bear in mind that Sysbench command line benchmark has options that can affect the results, and sadly we don’t have  the exact command line use for Raspberry Pi 3, but they’ve most likely used the default options as above (maximum prime number: 10,000), since another person ran the benchmark with 20,000 max on RPi3, which completed in around 119 seconds.

Which specific improvements of ARMv8 may bring the extra performance? Reader and commenter “Blu” explains:

Well, for one, compiler’s autovectorization actually works with aarch64 NEON, whereas in armv7 you had mostly to rely on manual vectorization via inline asm. Another big win is the twice-larger GPR & FPR files (when it comes to fp64: D16 -> D32), largely reducing register pressure in compiled (and not only) code. Last but not least, recent compilers have been more focused on AArch64, where they could produce better code vs armv7 not so much because of hw resource discrepancies, but because more man-effort went into AArch64 backends (and the arch provides a bunch of small tweaks that make compiler writer’s lives easier).

To sum it up, one can observe a significant speedup from armv7 to AArch64 for both objective (i.e. larger hw resources) and subjective (i.e. greater man-effort) reasons.

Now the Raspberry Pi 3 is not the only platform to use 32-bit operating systems, as most Android devices and boards I’ve tested so far, excluding DragonBoard 410c combine a 64-bit kernel with 32-bit user space. ODROID-C2 board, however, will support with Ubuntu 16.04 64-bit ARM (aka ARM64).

There’s however a side effect of compiling code with 64-bit instructions, the size gets bigger. Another reader “Jon” compiled code for Rockchip RK3128 Cortex A7 processor (ARMv7/32-bit) and Pine A64 Cortex A53 processor (ARMv8/64-bit), and found some large differences in memory size.

Binary ARMv7 Size (Bytes) ARMv8 Size (Bytes) Ratio
libcrypto.so  1,052,920  1,673,400  1.59x
toolbox Android 5.1  150,836  255,280  1.69x

So in case you are really tight on storage or memory, 32-bit code might be a better option.

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter

Amlogic S905 Source Code Published – Linux, U-Boot, Mali-450 GPU and Other Drivers

November 19th, 2015 33 comments

Amlogic has an open linux website where they regurlarly release GPL source code, and with Amlogic S905 devices coming to market, they’ve released a few tarballs at the beginning of the month including Linux 3.14 source code, U-boot source code, and Mali-450MP GPU kernel source code (obviously not userspace), as well as some other drivers for WiFi, NAND flash, PMU, TVIN, etc…
Amlogic_S905_Linux_MenuconfigLet’s get to the download links:

I quickly tried to build the Linux source. If you’ve never build a 64-bit ARM kernel or app before, you’ll fist need to install the toolchain. I installed the one provided with Ubuntu 14.04:

Now extract the tarball and enter the source directory:

At first I had a build failure due to a missing directory, so I created it, and use the default config for Amlogic S905/S912 (in arch/arm64/configs), before building the Linux kernel.

and it ended well:

So that’s a good starting for anybody wanting to work on the Android or Linux kernel…

Unrelated to Amlogic S905/Meson64, but I’ve also noticed some OpenWRT packages and rootfs  on Amlogic website that was released a little earlier this year. So either some people are using Amlogic Sxxx processors with OpenWRT, or Amlogic is working on a router chip that I missed. Probably the former.

Thanks to Olin.

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter

Google Releases Android L (Lollipop?) Developer Preview

June 26th, 2014 2 comments

Google I/O is taking place right now in San Francisco, and the company made several announcements. Although they have not announced the full codename of Android 5.0, referring to the next version as “Android L” (Lollipop would be nice though), but they’ve already documented the key changes made to Android L, and a developer preview will be released later today (26 June), together with binary images for Google Nexus 5 and Nexus 7.

Android_Lollipop

Beside the smartphone and tablet developer preview, there will be 3 other SDKs for Android L:

  • Android Wear SDK – Android for wearables with sync notifications, wearable apps, data transfer APIs, and voice actions, e.g. “Ok Google, call mum”.
  • Android TV Preview SDK – Android for TVs with pre-built fragments for browsing and interacting with media catalogs, in-app search, and recommendations.
  • Android Auto SDK – Android for the car with apps featuring consistent user experience between vehicles, and minimizing distractions.

I’ll go through various software and hardware announcements for Android Wear and TV in separate blog posts, and probably skip Android Auto for now.

So what’s new in Android L Developer Preview?

Material Design

Material Design is is a new design language that will let developer create app which look similar to Google Now. Google chose the name “Material” as it is apparently inspired from real materials such as paper and ink. Android L user interface will be entirely designed with Material Design. The best is to look at an example.

Gmail Now vs Gmail "L"

Gmail Now vs Gmail “L”

On the left, we’ve got the current Gmail app, and on the right the newly designed app for Android L. Lots of it looks like cosmetic changes, but you’ll have noticed the three dot and new mail icons are gone, and all menu will be accessible via the top left icon. There are also some light and shadow effects that will make users feel like they’re touching real elements.

More details can be found in this Material Design presentation (PDF).

Improved Notifications

Notifications have also changed with a new design based on Material, and the ability to display notifications on the lock screen.

Android_L_Notifications

I understand lockscreen notifications are optional, and if you don’t like to show them in the lock screen using visibility controls. As you can see from the screenshot above it works very similar to Google Now which cards that you can discard once you’re done. Notifications will also be able to pop-up in games or other full screen apps, and you’ll be able o take action within the notification, for example by declining or accepting a video call request.

Recents

The list of recent apps will become the list of recent everything, simply called “Recents”, as it will include both apps, web pages, and documents.

Better Tools for Improving Battery Life

As devices become more powerful, they also become more power hungry despite efforts by SoC designers to reduce energy usage. Badly programmed apps are however the main culprit of short battery life, so Google has introduced Project Volta to help user and developers optimize power consumption. Developers can use “Battery Historian” tool to monitor power consumption of different processes, and which hardware block (e.g. Cellular radio) is currently being used.

Battery_HistorianUsers will also have their own app / feature dubbed “Battery Saver” to improve battery life, and Google claims their Nexus 5 should be able to last an extra 90 minutes on a charge with Battery Saver enabled. This is achieved by reducing the performance of the device once the battery has dropped below 20% charge. At that time, a notification would pop-up to let the user select he wants to enable Battery Saver mode.

Under the hood improvements

As as been widely reported, Google recently killed Dalvik in a recent commit in AOSP, and ART will become the default JAVA runtime using ahead-of-time compilation for speedier application loading time, and memory usage improvements. Google also claims it provides true cross platform support for ARM, MIPS and x86.

Android L will support 64-bit instructions including ARMv8, x86-64 and MIPS64. This will provide a larger number of registers, and increased addressable memory space. Java developers won’t needto change their apps for 64-bit support. One the first Android64 devices is likely to be the Nexus 9 tablet powered by Nvidia Tegra K1 Denver as previously reported.

On the graphics side, Android L adds support for OpenGL ES 3.1, and includes Android Extension Pack for developers with tesselation and geometry shaders and other features that should bring PC and console class graphics to Android games according to Google.

Via Anandtech and Liliputing

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter

A Selection of FOSDEM 2013 Events

February 1st, 2013 No comments

FOSDEM is a 2-day (or 3 if you include Friday beer event) event where over 5,000 members of open source communities meet, share ideas and collaborate. It’s free to attend, and there’s no registration, so you just show up to attend. FOSDEM 2013 takes place on Feb 2-3 (yep, this week-end) in Brussels

There are 7 main tracks where sessions are organized:

  • fosdem logoOperating systems
  • Open source challenges
  • Security Janson
  • Beyond operating systems
  • Web development
  • Miscellaneous
  • Robotics

There are also keynotes and devroom for a total of 488 sessions. Developers rooms that may particularly be of interest to readers of this blog are:

All in all that’s a lot of sessions, and even though I won’t attend, I’m going to select a few from the main tracks:

This talk introduces the Fedora ARM Project and in particular the work we are doing to bring Fedora to emerging 64-bit ARM server systems.

Where are we today, one year after the unveiling of the Lima driver. This talk will cover the Lima driver (ARM Mali 200/400), but also other open source GPU driver projects such as the freedreno driver (Qualcomm Adreno), open source driver for Nvidia Tegra, etnaviv project (Vivante GC) and cover the status for Broadcoms Videocore and Imaginations PowerVR GPUs.

Based on the speaker’s experience of getting the support for the new Armada 370 and Armada XP ARM processors from Marvell into the mainline Linux kernel, this talk will detail the most important steps involved in this effort, and through this, give an overview of those changes and summarize the new rules for ARM Linux support.

  • Sunday 11:00 – 11:50 – Firefox OS by Jonas Sicking

Firefox OS is the next product being developed by Mozilla. It’s an open source OS based on the web and following the principals which have made the web a success. A phone running recent builds of Firefox OS (it’s not a finished product yet) will be demoed, and  the technologies and ideas behind Firefox OS will be discussed.

The systemd project is now two years old (almost three). It found adoption as the core of many big community and commercial Linux distributions. It’s time to look back what we achieved, what we didn’t achieve, how we dealt with the various controversies, and what’s to come next.

How Aldebaran Robotics is using open source on their NAO robot.

This talk will provide an overview of the Robot Operating System (ROS), an open software integration framework for robots.

This talk describes how the automotive industry has moved to embedded Linux and Open Source to develop the next generation of In-Vehicle Infotainment (IVI) and how it has met the challenges along the way.

What, why, when, where and how SecureBoot changes the way we build F/LOSS

 

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter

Linux 3.7 Release

December 11th, 2012 No comments

Linus Torvalds has announced the release of Linux Kernel 3.7:

Whee. After an extra rc release, 3.7 is now out. After a few more trials at fixing things, in the end we ended up reverting the kswapd changes that caused problems. And with the extra rc, I had decided to risk doing the buffer.c cleanups that would otherwise have just been marked for stable during the next merge window, and had enough time to fix a few problems that people found there too.

There’s also a fix for a SCSI driver bug that was exposed by the last-minute workqueue fixes in rc8.

Other than that, there’s a few networking fixes, and some trivial fixes for sparc and MIPS.

Anyway, it’s been a somewhat drawn out release despite the 3.7 merge window having otherwise appeared pretty straightforward, and none of the rc’s were all that big either. But we’re done, and this means that the merge window will close on Christmas eve.

Or rather, I’ll probably close it a couple of days early. For obvious reasons. It’s the main commercial holiday of the year, after all.

So aim for winter solstice, and no later. Deal? And even then, I might be deep into the glögg.

Linux 3.6 brought updates to Btrfs & ext4 file system, some initial work for SMBv2 protocol, networking improvements, safe swap over NFS/NBD and VFIO driver for device access from userspace drivers.

Linux 3.7 brings the following key changes:

  • ARM multi-platform support –  The Linux ARM implementation has added “multi-platform” support – the ability to build a unified ARM kernel image that can boot multiple hardware.  Read Supporting multi-platform ARM kernels for details.
  • ARM 64 bit support –  The new 64 bit ARM CPUs (ARMv8 architecture – AArch64) can run 32 bits code, but the 64 bit instruction set is completely new, and the Linux support has been implemented as a completely new architecture. For details, read Supporting 64-bit ARM systems.
  • Cryptographically signed kernel modules –  Linux 3.7 allows to optionally sign kernel modules, in order to completely disable the load of modules that have not been signed with the correct key. This feature is useful for security purposes, as an attacker who gains root user access will not be able to install a rootkit using the module loading routines. You may want to check out Loading signed kernel modules for more information.
  • Btrfs updates – fsync() speedups, Rrmove the hard link limits inside a single directory (from 20 to 65K), hole punching and chattr per-file NOCOW support.
  • Preliminary version of perf trace –  This tool looks somewhat like ‘strace’, but instead of using ptrace(), it uses the Linux tracing infrastructure. “pert trace” shows the events associated with the target, initially syscalls, but other system events like pagefaults, task lifetime events, scheduling events, etc. .
  • TCP Fast Open (Server Side) –  Linux aadded TCP Fast Open support for clients in Kernel 3.6, and this release adds Fast Open support for the server side. “Fast Open” is a optimization technique that can result in speed improvements of between 4% and 41% in the page load times on popular web sites. You can read TCP Fast Open: expediting web services for more information.
  • Experimental SMBv2 protocol support –   The cifs networking filesystem has added support for the version 2 of the SMB protocol. The SMBv2 protocol is the successor to the CIFS/SAMBA network file sharing protocols, and is the native file sharing mechanism for Windows OSs since Windows Vista. SMBv2 enablement will eventually allow users better performance, security and features, than would not be possible with previous protocols.
  • NFS v4.1 support no longer experimental –  NFS v4.1 (RFC 5661) has been part of the kernel as an experimental feature for a while, but has now been “upgraded” to a stable release. The main feature of NFS v4.1 is pNFS (parallel NFS) which can take advantage of clustered server deployments.
  • Virtual extensible LAN tunneling protocol –  vxlan (RFC draft) is a tunneling protocol that allows to transfer Layer 2 ethernet packets over UDP that is often used to tunnel virtual network infrastructure in virtualized environments. See VXLAN for Linux for details.
  • Intel “supervisor mode access prevention” support –  Supervisor Mode Access Prevention (SMAP) is a new security feature that will be available in future Intel processors.

Further details on Linux 3.7 are available on Kernelnewbies.org.

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter