Archive

Posts Tagged ‘armv8’

Setup Guide & Mini Review of BQ Aquaris M10 Ubuntu Edition Tablet from a Developer’s Perspective

April 30th, 2016 5 comments

BQ Aquaris M10 UBuntu Edition is the first officially supported Ubuntu tablet on the market. Blu, a frequent commenter on this blog, has purchased the Full HD version, and in the guest post below, shares his experience setting up the device for development purpose, before shortly providing his overall impressions about the tablet itself.

Quick introduction

Ever since I had to retire my trusty-but-ancient ARM notebook (a Genesi Efika iMX51) I’ve been looking for a new ARM notebook or perhaps a 2-in-1 device, that I could use for development on the go. The basic requirements are long battery life, passive cooling and reasonable price. Also, Just Enough Power™ for running vim, a couple of toolchains (gcc/clang with gold) and, well, enough grunt to run my coding experiments. Naturally, BQ M10 Ubuntu Edition immediately got my attention to the extent of me placing an order, which got delivered this past week. Allow me to share my impressions from the M10 so far.

Click to Enlarge

Click to Enlarge

First thing first: turning the M10 into a coder’s productivity device

There is plenty of know-how on the web regarding how to ‘unlock’ a Ubuntu Touch device into a full-fledged Linux box, but here we will describe the minimum steps to achieve this, moreover without the need for a desktop. The M10 needs to be on a Wifi network with Internet access, though.

From the Ubuntu Store, install the terminal application – access to the store requires a registration with a valid email address. Once we have that, we have proper control over our device via the on-screen kbd or via a physical Bluetooth or micro USB kbd.

What we immediately see from the above is that the device hosts a quad Cortex-A53 p0r3 (CPU part 0xd03), and the userspace is armhf – ’CPU architecture’ in /proc/cpuinfo should say ‘AArch64’ for an arm64 userspace; instead it says ‘8’ on an armhf userspace.

Typing on the on-screen kbd is a mere curiosity, so before we get ourselves a decent Bluetooth kbd or a micro USB-to-female-USB adapter (for a standard usb kbd) we will need something better to type on. Getting an ssh server on the device takes a minimal effort – the package is already installed, it just needs enablement. We also need a public ssh key ready on the desktop machine, as the ssh server is factory-configured for public-key access only. So, assuming we have our public key handy on the desktop, we need to do the following in our M10 home:

Now we can ssh to [email protected]_ip and enjoy a proper kbd. Apropos, the final step of actually enabling the ssh server should also be achieved via engaging the tablet’s Developer mode in the About This Device tab in the system settings.

A quick look at the mounted filesystem shows that the rootfs is mounted as read-only, and that can be a show-stopper for any apt-get we plan to do next. So we need to enable read-write mode on the root fs via:

Please note that the system will automatically reboot after this command; our rootfs will be write-enabled after that. Then we can:

Just be warned that keeping the rootfs in write-enabled state actually disables OTA updates of the tablet fw. So once we’re done with apt-get for the day, we might want to:

For reference, these are the g++ and clang++ versions that we can get on the tablet currently from the standard vivid repositories:

Running (natively-built) binaries from within our home folder takes some tinkering, though. The reason for that is apparmor – this daemon is factory-configured to not allow the execution of apps from the /userdata mount-point (/userdata/user-data is where our home is at). To solve that inconvenience, we need to find the app profile of our indispensable terminal app, and edit it appropriately to allow the execution of binaries from our home.

Please note the actual version of the terminal app might be different. In there we find the following lines:

And add to them:

Followed by:

So, now we can build and test our code on the M10. A couple of notes:

  • Since this is an armhf userland, i.e. it’s 32-bit ARM, the default target of gcc/g++ is thumb2 (as per Canonical’s worldview) – one might want to pass -marm to the compiler for a few more percents of performance.
  • There’s a compressed ramdrive of the size of 0.5GB taken from our precious little 2GB RAM; it’s used as a swap partition. Whether that’s a beneficial decision for our purposes is not clear.
  • The Cortex-A53 in the MT8163A (i.e. the 1.5GHz version) appears to be somewhat slower in this configuration than other vendor’s A53s of the same revision (e.g. Rockchip’s RK3368 @ 1.51GHz). I don’t know what to attribute this to yet. Could be because of intricacies of the scheduler and/or performance manager, though the latter should be bog standard cpufreq. Or because of the lxc container with a minimal android providing the display painting services. Or it could be a hw difference somewhere in the cache hierarchy. An investigation is pending in the indefinite future.

Informal impressions

The M10 is a solidly-built piece of ‘luggable’ electronics, AKA portable things you always lug along in your backpack for 24/7 accessibility. Whenever I’ve found myself wishing for something more in the M10, it’s normally been a sw issue. Back to my original criteria for a productivity portable, its battery life is nice – lasts between one and two days of trivial coding use – vim, build, test, repeat. The pricing is slightly on the upper side for this class of hw, IMO, but hey, early adopters’ premium (which apparently I was willing to pay). For the price one gets a cluster of Cortex-A53 at (almost) industry-standard performance levels, 2GB of RAM and 16GB of eMMC (of ~150MB/s read BW). The quality of the screen also bears mentioning – it’s quite nice – better than that of my aging Acer netbook.

That said, the things that need improving going forward:

  • Android needs to go; Canonical need to pull their act together and provide a proper 100% Linux on this class of devices. Whether that includes ‘muscling’ vendors like MediaTek into conformance or just paying for the development of native graphics stacks – that’s rather irrelevant to the end user.
  • Along the above: out with the armhf and in with the arm64 userspaces on aarch64 hw – it’s about darn time.
  • Prices need to get more realistic, but that’s a matter of market adoption, I guess. At least, for the price of the M10 one should be able to get 4GB or RAM.
Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter

NanoPC-T3 Octa-core Cortex A53 Single Board Computer Sells for $60

April 29th, 2016 8 comments

FriendlyARM launched NanoPC-T2 single board computer based on Samsung 5P4418 quad core Cortex A9 processor about 3 months ago, and the company has now an update based on Samsung S5P6818 Octa-Core A53 processor with the exact same interfaces and features including Gigabit Ethernet, WiFI, and Bluetooth, HDMI 1.4a, 30-pin expansion headers, etc…

Click to Enlarge

Click to Enlarge

NanoPC-T3 specifications:

  • SoC – Samsung S5P6818 octa core Cortex A53 processor @ up to 1.4GHz with Mali-400MP GPU
  • System Memory – 1 or 2GB 32bit DDR3 RAM
  • Storage – 8GB eMMC flash, and 1x SD card slot
  • Connectivity – Gigabit Ethernet (RTL8211E), 802.11 b/g/n WiFi and Bluetooth LE 4.0 (Ampak AP6212) with on-board chip antenna and IPX antenna connector
  • Video Output / Display I/F- 1x HDMI 1.4a, LVDS, MIPI DSI, parallel RGB LCD
  • Audio I/O – HDMI, 3.5mm audio jack, on-board microphone
  • Camera – 1x DVP interface, 1x MIPI CSI interface
  • USB – 2x USB 2.0 type A host ports; 1x micro USB 2.0 OTG port; 2x USB 2.0 host ports via 8-pin header
  • Expansions Headers – 30-pin header for GPIO, 8-pin header for power signals, reset and LED 1-2
  • Debugging – 4-pin header for serial console
  • Misc – Power switch, reset button, 1x power & 2x user LEDs, RTC battery header, boot selection button (SD card / eMMC)
  • Power Supply – 5V/2A via power barrel; AXP228 PMIC
  • Dimension – 100 x 60 mm (6-layer PCB)

64-bit_octa-core_ARM-development-boardThe board can run Android and Debian from eMMC flash or SD card like its predecessor, as well as Ubuntu Core with Qt, and software and hardware documentation can be found on the Wiki. The board ships with the heatsink shown in the top picture.

The board can be bought on FriendlyARM website for $60 + shipping via China Post ($10), Fedex ($14) or DHL ($34). Shipping fees in brackets are for my location, so you may get other quotes.

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter

ARM Cortex A72 is Getting into Sub $200 Android Smartphones Thanks to Mediatek Helio X20 SoC

April 7th, 2016 18 comments

Flagship Android smartphones now ship with processors featuring Cortex A72 cores,  or custom variants from Samsung or Qualcomm, and normally cost several hundred dollars. Some new smartphones recently announced with MediaTek Helio X20 deca-core processor lower the price barrier in the $250 to $400 range, but at least two upcoming models will bring the cost below $200: Doogee P7 ($169.99), and Vernee Apollo Lite ($199.99). So let’s see what we’ve got for those prices.

Doogee P7

Doogee_F7Doogee P7 (preliminary) Specifications:

  • SoC – Mediatek Helio X20 (MT6797) deca-core processor with 2x Cortex A72 cores @ 2.5 GHz, 4x Cortex A53 cores @ 2.0 GHz, and 4x Cortex A53 cores @ 1.4 GHz and ARM Mali-T880MP GPU @ 850 Mhz
  • System Memory – 3GB RAM
  • Storage – 16 GB storage and micro SD slot up to 64GB (shared with dual SIM slot)
  • Display – 5.5″ capacitive touchscreen, 1920×1080 resolution; 160K colors
  • Cellular Connectivity
    • 2G – GSM 850/900/1800/1900MHz
    • 3G – WCDMA 900/2100MHz
    • 4G – FDD LTE Band 1/3/7/8/20
    • Dual SIM Card Dual Standby (One Micro SIM Card)
  • Connectivity – 802.11 b/g/n WiFi, Bluetooth 4.0, GPS/A-GPS, and FM radio
  • Camera – 13.0 MP rear camera with flash and auto-focus, 8.0 MP front-facing camera
  • USB – 1x micro USB OTG port
  • Audio – Speaker, microphone, and 3.5mm audio jack
  • Sensors – gravity, others? (TBD)
  • Battery – 3600 mAh battery
  • Dimensions – 159.6 x 82.1 x 9.5 mm
  • Weight – 170 grams

The phone will run Android 6.0, and ships with a charger and USB cable. You can find some more details on DoogeeMobile, where they show the $169.99 price tag in comments (likely without shipping).  There will also be a Doogee P7 Pro version with 32GB storage and 4GB RAM, better camera and display that should sell for $200. It’s unclear when the two versions of the phone will be available, as they keep postponing them.

Vernee Apollo Lite

Vernee_Apollo_LiteVernee Apollo Lite is the little brother of Vernee Apollo with Helio X20, 6GB RAM, and 128 GB, but selling for half price thanks to lower, but still decent (preliminary)  specifications:

  • SoC – Mediatek Helio X20 (MT6797) deca-core processor with 2x Cortex A72 cores @ 2.5 GHz, 4x Cortex A53 cores @ 2.0 GHz, and 4x Cortex A53 cores @ 1.4 GHz and ARM Mali-T880MP GPU @ 850 Mhz
  • System Memory – 4GB RAM
  • Storage – 32 GB storage + micro SD slot
  • Display – 5.5″ capacitive touchscreen, 1920×1080 resolution
  • Cellular Connectivity – No details yet; Dual SIM Card Dual Standby (SIM + SIM, or SIM + micro SD configuration)
  • Connectivity – TBD
  • Camera – 16.0 MP rear camera, 5.0 MP front-facing camera
  • USB – 1x USB type C port
  • Sensors – Compass, gyroscope
  • Battery – TBD
  • Dimensions – TBD
  • Weight – TBD

The phone will also run Android 6.0, and is expected to launch in May for $199.99 + shipping. The company announced the phone on their Facebook page.

Even though the complete specifications are not available for Apollo Lite and Doogee P7, both devices appear to be pretty good smartphones considering the price point.

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter

Amlogic S912 Processor Specifications

April 6th, 2016 15 comments

Amlogic plans to launch at least three new processors for OTT boxes and set-top boxes this year: Amlogic S905X, Amlogic S912, and Amlogic S905D.  We already knew Amlogic S905 specifications, but I’ve recently received a document with some more details about Amlogic S912 revealing Mali-T820 GPU, a lack of USB 3.0 support, but still some interesting features such as HDMI 2.0a, 4K VP9, 10-bit H.265, Gigabit Ethernet Mac, and so on.

Click to Enlarge

Click to Enlarge

Amlogic S912 specifications with highlights in bold showing differences with Amlogic S905X:

  • CPU Sub-system –  Octa core ARM Cortex-A53 CPU up to 2 GHz (DVFS) with two CPU clusters one optimized for high performance (big) and the other for low power (LITTLE)
  • 3D Graphics Processing Unit –ARM Mali-T820MP3 GPU up to 750MHz (DVFS) with 3 shader engines supporting OpenGL ES 1.1/2.03.1, DirectX 11 FL9_3, OpenCL 1.1/1.2 full profile and RenderScript.
  • 2.5D Graphics Processor – Fast bitblt engine with dual inputs and single output, programmable raster operations (ROP) and polyphase scaling filter, etc..
  • Crypto Engine – AES/AES-XTS block cipher with 128/192/256 bits keys, DES/TDES block cipher, hardware crypto key-ladder operation and DVB-CSA for transport stream encryption,  built-in hardware True Random Number Generator (TRNG), CRC and SHA-1/SHA-2/HMAC SHA engine
  • Video/Picture CODEC
    • Amlogic Video Engine (AVE-10) with dedicated hardware decoders and encoders
    • Supports multiple “secured” video decoding sessions and simultaneous decoding and encoding
    • Video/Picture Decoding
      • VP9-10 Profile-2 up to [email protected]
      • H.265 HEVC [email protected] up to [email protected]
      • H.264 AVC [email protected] up to [email protected], H.264 MVC up to 1080p @60fps
      • MPEG-4 [email protected] up to [email protected] (ISO-14496)
      • WMV/VC-1 SP/MP/AP up to [email protected]
      • AVS-P16(AVS+) /AVS-P2 JiZhun Profile up to [email protected]
      • MPEG-2 MP/HL up to [email protected] (ISO-13818)
      • MPEG-1 MP/HL up to [email protected] (ISO-11172)
      • RealVideo 8/9/10 up to [email protected]
      • WebM up to VGA
      • MJPEG and JPEG unlimited pixel resolution decoding (ISO/IEC-10918)
      • Supports JPEG thumbnail, scaling, rotation and transition effects
    • Video/Picture Encoding
      • Independent JPEG and H.264 encoder with configurable performance/bit-rate
      • JPEG image encoding
      • H.264 video encoding up to [email protected] with low latency
  • Video Post-Processing Engine – Dolby Vision, HDR10 and HLG HDR processing, motion adaptive 3D noise reduction filter, advanced motion adaptive edge enhancing de-interlacing engine, 3:2 pull-down support, deblocking filters, etc..
  • Video Output
    • Built-in HDMI 2.0a transmitter including both controller and PHY with 3D, CEC, HDR and HDCP 2.2, [email protected] max resolution output
    • CVBS 480i/576i standard definition output
    • RGB888 TTL interface up to 1920×1080
  • Camera Interface – ITU 601/656 parallel video input with down-scalar, supports camera input as YUV422, RGB565,16bit RGB or JPEG
  • Audio Decoder and Input/Output
    • Supports MP3, AAC, WMA, RM, FLAC, Ogg and programmable with 7.1/5.1 down-mixing
    • I2S audio interface supporting 8-channel (7.1) input and output
    • Built-in serial digital audio SPDIF/IEC958 output and PCM input/output
    • Built-in stereo audio DAC
    • Dual-channel digital microphone PDM input
    • Supports concurrent dual audio stereo channel output with combination of analog+PCM or I2S+PCM
  • Memory and Storage Interface
    • 16/32-bit SDRAM memory interface running up to DDR2400
    • Supports up to 2GB DDR3/4, DDR3L, LPDDR2, LPDDR3 with dual ranks
    • Supports SLC/MLC/TLC NAND Flash with 60-bit ECC
    • SDSC/SDHC/SDXC card and SDIO interface with 1-bit and 4-bit data bus width supporting up to UHS-I SDR104
    • eMMC and MMC card interface with 1/4/8-bit data bus width fully supporting spec version 5.0 HS400
    • Supports serial 1, 2 or 4-bit NOR Flash via SPI interface
    • Built-in 4k bits One-Time-Programming memory for key storage (That must be where DRM / HDCP keys are programmed)
  • Network
    • Integrated IEEE 802.3 10/100/1000M Gigabit Ethernet MAC controller with RGMII interface
    • Integrated 10/100M PHY interface
    • Supports Energy Efficiency Ethernet (EEE) mode
  • Digital Television Interface
    • Transport stream (TS) input interface with built-in demux processor for connecting to external digital TV tuner/demodulator and one output TS interface
    • Built-in PWM, I2C and SPI interfaces to control tuner and demodulator
    • Integrated CI+ port and ISO 7816 smart card controller
  • Integrated I/O Controllers and Interfaces
    • 3x USB 2.0 high-speed USB I/O, 2x USB Host and one USB OTG
    • Multiple UART, I2C and SPI interface with slave select
    • Multiple PWMs
    • Programmable IR remote input/output controllers
    • Built-in 10bit SAR ADC with 2 input channels
    • General Purpose IOs with built-in pull up and pull down
  • System, Peripherals and Misc. Interfaces
    • Integrated general purpose timers, counters, DMA controllers
    • 24 MHz crystal input
    • Embedded debug interface using ICE/JTAG
  • Power Management
    • Multiple external power domains controlled by PMIC, and internal ones controlled by software
    • Multiple sleep modes for CPU, system, DRAM, etc.
    • Multiple internal PLLs for DVFS operation
    • Multi-voltage I/O design for 1.8V and 3.3V
    • Power management auxiliary processor in a dedicated always-on (AO) power domain that can communicate with an external PMIC
  • Security
    • Trustzone based Trusted Execution Environment (TEE)
    • Secured boot, encrypted OTP, encrypted DRAM with memory integrity checker, hardware key ladder and internal control buses and storage
    • Protected memory regions and electric fence data partition
    • Hardware based Trusted Video Path (TVP) , video watermarking and secured contents (requires SecureOS software)
    • Secured IO and secured clock
  • Package – LFBGA 15 x 15 mm, 0.65 ball pitch, RoHS compliant

That means Amlogic S905X and S912 have the exact same video playback capabilities, although S912 will also support Dolby Vision HDR standard. The main differences are the eight Cortex A53 cores clocked at 2.0 GHz (instead of 4x A53 @ 1.5 GHz), and the more power ful Mali-T820MP3 GPU, as well as support for LCD panel (e.g. for tablets) thanks to an extra RGB interface. Finally S912 has three USB interfaces, instead of just two for S905X.

Amlogic 2016 roadmap shows S905X is scheduled for Q1 2016, and S912 for Q2 2016,  but it’s likely we need to add one or two more quarters before we get any Android 6.0 devices based on the new processors.

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter

Rockchip PX3 and PX4 Processors Are Designed for Automotive Infotainment & Dashboards

April 5th, 2016 No comments

Rockchip PX2 processor, similar to Rockchip RK3066 but targeting industrial and automotive applications, was launched in 2014. Rockchip now has at least two new member in their PX family with PX3 and PX4 specifically designed for automotive infotainment and car dashboards thanks to dual display support, at least according to one article on Elezine.

Rockchip_PX4_PX3_PX2Rockchip PX3 is definitely confirmed with its own page on Rockchip website, and features a quad core Cortex A9 @ 1.4 GHz with a Mali-400MP4 GPU, and while there’s no info about PX4 yet on the company website, the SoC should come with a quad core Cortex A53 processor @ 1.3 GHz with a Mali-T722 GPU, as well as HDMI 2.0 video output, and H.265 video decoding.

The article also lists 7 key function of Rockchip solutions:

  1. “Quick startup and fast revert track”
  2. Navigation system with free updates
  3. HD video recording (car DVR)
  4. Advanced ADAS algorithm to achieve the trajectory, distance between vehicles, license plate recognition, collision avoidance and other functions
  5. Dual screen support
  6. Mobile Internet control
  7. Support for 1080p H.264 decoding and voice recognition input

I could not find system or demo with dual display system with PX3 (dashboard + infotainment), but did find a video of a double DIN car stereo based on Rockchip PX3 processor and running Android 4.4.

Auto Pumpkin sells several PX3 based stereo for various car models on their website for $250 and up. Cold boot time is rather standard however (25 to 30 seconds). I found about PX3 processor via one IloveRockchip tweet boasting about a “large screen in-vehicle navigation for Dongfeng Kadjar”, but I could not find any details, as maybe the news is only reported in Chinese media.

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter

Telechips TCC898x (Alligator) 64-bit ARM SoC is Designed for High-end 4K Set-Top Boxes

March 18th, 2016 2 comments

Telechips processors were often found in consumer devices such as Android tablets, mini PCs and TV Sticks a few years ago, but it’s been a while since I have seen a devices based on Telechips. So after seeing an automotive SoC from the company, I decided to visit the company website to check if they were still designing processors for the consumer market, and found TCC898x quad core Cortex A53 processor for “Smart Stick, IP-Client and STB with 4K 60fps decoding” with some interesting features.

Telechips_TCC8980Telechips TCC898x SoC specifications:

  • CPU- Quad core Cortex A53 processor with NEON, TrustZone, 32KB/32KB L1 cache and 512KB L2 cache
  • MCU – Cortex-M4 micro-controller
  • GPU
    • 2D – Vivante GC420 composition processing core for 4K user interfaces
    • 3D – ARM Mali-400MP2
  • VPU – Multi-format VPU and 4K VPU with HEVC and VP9 support
  • Memory I/F – DDR3/4
  • Storage I/F – NAND controller (60-bit ECC), SD/eMMC controller
  • Peripherals
    • Video Output – Display Controller, LVDS transmitter, HDMI 2.0 with HDCP 2.x, video composite
    • Video Input – Yes, but no details.
    • Audio – S/PDIF Tx and Rx, 9.1 channels I2S , stereo I2S
    • USB – USB 2.0 host, USB 2.0 OTG, USB 3.0
    • Gigabit Ethernet MAC
    • TS and TS demux (normally used to interface tuners)
    • PCIe interface
    • I2C, UART, GPSB (General Purpose Serial Bus), SDIO
    • Timer/RTC, DMA, IR receiver

The overall concept is pretty similar to their Telechips TCC893x Cortex A9 + Cortex-M3 processor, except most items have been upgraded, except notably the 3D GPU which remains a Mali-400MP2.

Telechips TCC898x supports Linux (with HTML5 interface) and Android operating systems, and contrary to most other Android TV boxes and set-top boxes, devices based on the new processor will support 4K user interfaces too thanks to Vivante GC420 2D GPU.  The chip also support hardware cyphers and conditional access (CAS) for “full compliance with 4K contents security guideline for variable STB applications”. I could not find much more information, and Googling for TCC8980 processor (and others up to TCC8989) did not return anything interesting so far. The last update to Telechips open source page shows the company released Linux 3.4.45 source code in February 2015.

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter

Linux 4.5 Released – Main Changes, ARM and MIPS Architectures

March 15th, 2016 1 comment

Linus Torvalds released Linux Kernel 4.5 on Sunday:

So this is later on a Sunday than my usual schedule, because I just couldn’t make up my mind whether I should do another rc8 or not, and kept just waffling about it. In the end, I obviously decided not to,but it could have gone either way.

We did have one nasty regression that got fixed yesterday, and the networking pull early in the week was larger than I would have wished for. But the block  layer should be all good now, and David went through all his networking commits an extra time just to make me feel comfy about it, so in the end I didn’t see any point to making the release cycle any longer than usual.

And on the whole, everything here is pretty small. The diffstat looks a bit larger for an xfs fix, because that fix has three cleanup refactoring patches that precedes it. And there’s a access type pattern fix in the sound layer that generated lots of noise, but is all very simple in the end.

In addition to the above, there’s random small fixes all over-shortlog appended for people who want to skim the details as usual.

Go test, and obviously with 4.5 released, I’ll start the merge window for 4.6.

Linux 4.4 added support for a faster and leaner loop device, 3D support in virtual GPU driver, TCP improvements, various file systems improvements for BTRFS, EXT-4, CIFS, XFS etc… Some notable changes made to Linux 4.5 include:

  • Copy offloading with new copy_file_range(2) system call – Performance improvements on local file systems are marginal, but for networked file systems such as NFS, you could copy a file internally on a server drive without transferring file data over the network.
  • Experimental PowerPlay for amdgpu driver
  • Btrfs free space handling scalability improvements – New, experimental way of representing the free space cache that takes less work overall to update on each commit and fixes the scalability issues for large drives (30TB+). It can be enabled with -o space_cache=v2 mount option, and you can revert to the one method with -o clear_cache,space_cache=v1.
  • Support for GCC’s Undefined Behavior Sanitizer (-fsanitize=undefined) UBSAN (Undefined Behaviour SANitizer) is a debugging tool available since GCC 4.9. It inserts instrumentation code during compilation that will perform checks at runtime before operations that could cause undefined behaviors. Linux 4.5 supports compiling the kernel with the Undefined Behavior Sanitizer enabled.
  • Next gen media controller whose “goal is to improve the media controller to allow proper support for other types of Video4Linux devices (radio and TV ones) and to extend the media controller functionality to allow it to be used by other subsystems like DVB, ALSA and IIO”. See lkml for details

Some new features and improvements specific to the ARM architecture:

  • Allwinner:
    • Allwinner A80 support – IR receiver driver, NMI controller,PRCM driver, R_PIO support, and RSB driver
    • Allwinner H3 SoC support – H3 USB PHY clocks
    • A10/A20 Video Engine clocks
    • MIC1 capture for sun4i codec
    • Audio codec enabled on various boards
    • Added board – Orange Pi Plus
  • Rockchip:
    • Crypto module and io-domain driver enabled in multi_v7_defconfig
    • Tweaks for RK3368 SoC and eval board
    • Added Rockchip RK3228 SoC and eval board
    • New RK3228 subdriver in pinctrl
    • SPI driver fix
    • Added support for RK3399 in thermal driver
    • RK3036: Added SMP support, emac support
    • Expose USB PHY PLLs
  • Amlogic
    • Device tree changes – Add watchdog node to meson8b, add status LED for ODROID-C1
    • Watchdog timer modifications
  • Samsung
    • eMMC/SDIO minor fixes usage of bindings on Snow and Peach chromebooks.
    • Remove FIMD from Odroid XU3-family because on XU3 it cannot be used yet and on XU3-Lite and XU4 it is not supported.
    • Remove deprecated since June 2013 samsung,exynos5-hdmi.
    • Add support for Pseudo Random Generator on Exynos4 (Trats2 for now). This depends on new SSS clock.
    • Add rotator nodes for Exynos4 and Exynos5.
    • Switch DWC3_1 on Odroid XU3 and XU3-Lite to peripheral mode because  now it cannot be used as OTG.
    • Cleanup the G2D usage on Exynos4 and add it to a proper domain in case of Exynos4210.
    • Put MDMA1 in proper domain on Exynos4210 as well.
    • Minor cleanups
  • Qualcomm
    • New pinctrl subdrivers for Qualcomm MSM8996, PM8994,  PM8994 MPP support
    • Added Qualcomm PCIe controller driver
    • Qualcomm ARM64:  Add fixed rate oscillators to dts, fixup PMIC alias and properties, change 8916-MTP compatible to be compliant with new scheme, fix 8×16 UART pinctrl configuration, add SMEM, RPM/SMD, and PM8916 support on MSM8916
  • ARM SoC multiplatform code – “This branch is the culmination of 5 years of effort to bring the ARMv6 and ARMv7 platforms together such that they can all be enabled and boot the same kernel”
  • ARM64 – hugetlb: add support for PTE contiguous bit; perf: add support for Cortex-A72;
  • Other new hardware or SoCs – Sigma Designs ARM Cortex-A9 Tango4 “Secure Media Processor” platforms (SMP8756, SMP8758, and SMP8759), TI-based DM3730 from LogicPD (Torpedo), Cosmic+ M4 (nommu) initial support (Freescale Vybrid), Veyron-mickey (ASUS Chromebit), BCM2836 and Raspberry Pi 2 B.

MIPS changes:

  • Add support for PIC32MZDA platform
  • bcm963xx: Add Broadcom BCM963xx board nvram data structure
  • dts: Add initial DTS for the PIC32MZDA Starter Kit
  • math-emu: Add IEEE Std 754-2008 ABS.fmt and NEG.fmt emulation
  • math-emu: Add IEEE Std 754-2008 NaN encoding emulation
  • math-emu: Add IEEE Std 754 conformance mode selection
  • pci: Add MT7620a PCIE driver
  • ralink: add MT7621 support
  • zboot: Add support for serial debug using the PROM

If you want to get the full details, I’ve generated Linux 4.5 Changelog with comments only (12.2MB) using git log v4.4..v4.5 --stat, but it’s probably a better idea to simply check out Linux 4.5 changelog on kernelnewbies.org.

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter

64-bit ARM (Aarch64) Instructions Boost Performance by 15 to 30% Compared to 32-bit ARM (Aarch32) Instructions

March 1st, 2016 12 comments

Yesterday was quite an eventful day with the launch of two low cost 64-bit ARM development boards, namely Raspberry Pi 3 and ODROID-C2, and as usual there were some pretty interesting discussions related to the launch of the boards in the comments section. One of the subject that came is that while Raspberry Pi 3 board is using a 64-bit processor, the operating systems are still compiled with 32-bit instructions (Aarch32) and even optimized for ARMv6, and they intend to keep it that way according to Eben Upton interview:

Eben readily admits that not all the capabilities of the new parts are going to be used at launch, however. “Although it is a 64‑bit core, we’re using it as just a faster 32-bit core,” he reveals about the Pi 3’s central processing unit. “I can imagine there’d be some real benefits [to 64-bit code]. The downside is that you do really create a separate world. To access that benefit, you’d have to have two operating systems. I’m hoping that someone will come and demonstrate to me that this is a good idea. But there are some really compelling advantages to still being basically ARMv6, and because it’s [Cortex-]A53 it’s a really good 32‑bit processor.”

So the clear advantage of running ARMv6 32-bit code is that a single image can be used for all Raspberry Pi boards, while of they had to optimize code for each board, they’d have one image for Raspberry Pi (ARMv6), one for Raspberry Pi 2 (ARMv7), and a final one for Raspberry Pi 3 (ARMv8), and obviously that would require a lot of work behind the scene. In theory, there should be a performance advantage of running 64-bit ARM instructions, but the question is how much?

ARM brings some perspective to performance improvement in their presentation “ARMv8: Advantages for Android”  where they compare performance improvements of Aarch64 (64-bit ARM instructions) over  Aarch32 (32-bit ARM instructions) running benchmarks compiled with either instructions set on Juno development board.

Click to Enlarge

Click to Enlarge

The first charts show native (C/C++ code) performance is between 15% to about 20% faster in bionic benchmarks, and Antutu 5.0 single thread and multi-thread CPU tests.

Click to Enlarge

Click to Enlarge

The second chart shows ART (Java runtime) performance is also about 15% better with Aarch64 using Quadrant 2.0 CPU score, and close to 30% faster with Linpack multi-threaded benchmark.

Broadcom BCM2837 processor’s Cortex A53 cores are likely to be further impacted since they are running a code compiled for the older ARMv6, which is slower than ARMv7. Let’s take another fun example. Raspberry Pi 3 benchmarks released on MagPi reveal sysbench completes in 49.02 seconds for multi-threaded CPU test, and tkaiser, an active developer for armbian project, ran sysbench on Pine A64 development on Ubuntu 16.04 64-bit, and the results are quite surprising considered Allwinner A64 is also a quad core Cortex A53 processor @ 1.2 GHz:

So it took only 3.25 seconds on Pine A64 with ARMv8 instructions compared to 49.02 seconds on Raspberry Pi 3 with ARMv6 instructions, so it appears that if you are specifically looking for prime numbers it does pay big time (15 times faster) to switch to Aarch64 instructions. Bear in mind that Sysbench command line benchmark has options that can affect the results, and sadly we don’t have  the exact command line use for Raspberry Pi 3, but they’ve most likely used the default options as above (maximum prime number: 10,000), since another person ran the benchmark with 20,000 max on RPi3, which completed in around 119 seconds.

Which specific improvements of ARMv8 may bring the extra performance? Reader and commenter “Blu” explains:

Well, for one, compiler’s autovectorization actually works with aarch64 NEON, whereas in armv7 you had mostly to rely on manual vectorization via inline asm. Another big win is the twice-larger GPR & FPR files (when it comes to fp64: D16 -> D32), largely reducing register pressure in compiled (and not only) code. Last but not least, recent compilers have been more focused on AArch64, where they could produce better code vs armv7 not so much because of hw resource discrepancies, but because more man-effort went into AArch64 backends (and the arch provides a bunch of small tweaks that make compiler writer’s lives easier).

To sum it up, one can observe a significant speedup from armv7 to AArch64 for both objective (i.e. larger hw resources) and subjective (i.e. greater man-effort) reasons.

Now the Raspberry Pi 3 is not the only platform to use 32-bit operating systems, as most Android devices and boards I’ve tested so far, excluding DragonBoard 410c combine a 64-bit kernel with 32-bit user space. ODROID-C2 board, however, will support with Ubuntu 16.04 64-bit ARM (aka ARM64).

There’s however a side effect of compiling code with 64-bit instructions, the size gets bigger. Another reader “Jon” compiled code for Rockchip RK3128 Cortex A7 processor (ARMv7/32-bit) and Pine A64 Cortex A53 processor (ARMv8/64-bit), and found some large differences in memory size.

Binary ARMv7 Size (Bytes) ARMv8 Size (Bytes) Ratio
libcrypto.so  1,052,920  1,673,400  1.59x
toolbox Android 5.1  150,836  255,280  1.69x

So in case you are really tight on storage or memory, 32-bit code might be a better option.

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter