Archive

Posts Tagged ‘intel’

LEAGOO T5c Smartphone Features Spreadtrum SC9853i Octa-core Intel Airmont SoC

November 16th, 2017 7 comments

Intel is supposed to have left the mobile and IoT markets, but a few month ago, I wrote about Spreadtrum SC9861G-IA, an octa-core Intel Airmont SoC designed for LTE smartphones. Airmont is the microarchitecture used in Intel’s Cherry Trail and Braswell SoC, so the Spreadtrum SoC is not based on a new microarchitecture, but it still shows Intel decided to still use the technology, just not with their name on the processor.

The news was published in February, but so far I have not seen any phone based on the processor. Instead, a similarly specced SoC, namely Spreatrum SC9853i, is now found in LEAGOO T5c 5.5″ smartphone with 3GB RAM and 32GB flash.

LEAGOO T5c smartphone specifications:

  • SoC – Spreatrum SC9853i octa-core Intel 64-bit Airmont “Cherry Trail-T” processor @ up to 1.8 GHz (14-nm FinFET process)
  • System Memory – 3GB RAM
  • Storage – 32GB eMMC flash
  • Display – 5.5″ SHARP Full HD IPS display
  • Cellular Connectivity – LTE cat 6 and dual 4G networks
  • Camera – 13.0 MP + 2.0 MP dual rear camera with aufocus,  front-facing camera
  • Misc – Font fingerprint scanner
  • Battery – 3,000 mAh battery with 5V/2A “quick charge”

The operating system is not mentioned at all, but it’s probably safe to assume it’s running some version of Android.

The company claims SC9853i delivers 30% less power consumption, and is 25% to 39% faster than MediaTek MT6750 octa-core ARM Cortex A53 processor @ 1.5 GHz (Cluster 1) / 1.0 GHz (Cluster 2) for single core, multi-core and “CPU total” – whatever that means – performance.

Leegoo T5c retail price will be around $129.99, but they have a promotion at launch, offering the phone for $1.99 to 5 winners on December 4, as well as a $30 discount coupon to some of the participants, making it a $100 phone. For comparison, LEAGOO T5 smartphone based on  Mediatek MT6750T SoC, but with 4GB RAM/64GB storage instead of just 3GB/32GB, currently sells for $128 shipped.

First OpenCL Encounters on Cortex-A72: Some Benchmarking

November 14th, 2017 4 comments

This is a guest post by blu about his experience with OpenCL on MacchiatoBin board with a quad core Cortex A72 processor and an Intel based MacBook. He previously contributed several technical articles such as How ARM Nerfed NEON Permute Instructions in ARMv8 or OpenGL ES development on Ubuntu Touch.

Qualcomm launched their long-awaited server ARM chip the other day, and we started getting the first benchmarks. Incidentally, I too managed to get some OpenCL ray-tracing code running on an ARM Cortex-A72 machine that same day (thanks to pocl – an LLVM-based open-source OCL multi-platform implementation), so my benchmarking curiosity got me.

The code in question is an OCL (half-finished) port of a graphics demo from 2014. Some remarks of what it does:

For each frame: a single thread builds a sparse voxel octree from a dynamic voxel scene; the octree, along with current camera settings are passed to an OCL kernel via double buffering; kernel computes a screen-space map of object IDs from primary-ray-hit voxels (kernel utilizes all compute units of a user-specified device); then, in headless mode used in the test, the app discards the frame. Test continues for a user-specified number of frames, and reports the average frames per second (FPS) upon termination.

Now, one of the baselines I wanted to compare the ARM machine against was a MacBook with Penryn (Intel Core 2 Duo Processor P8600), as the latter had exhibited very similar IPC characteristics to the Cortex-A72 in previous (non-OCL) tests, and also both machines had very similar FLOPS paper specs (and our OCL test is particularly FP-heavy):

  • 2x Penryn @ 2400MHz: 4xfp32 mul + 4xfp32 add per clock = 38.4GFLOPS total
  • 4x Cortex-A72 @ 1300MHz: 4xfp32 mul-add per clock = 41.6GFLOPS total

Beyond paper specs, on a SGEMM test the two machines showed the following performance for cached data:

  • Penryn: 4.86 flop/clock/core, 23.33GFLOPS total
  • Cortex-A72: 6.52 flop/clock/core, 33.90GFLOPS total

And finally RAM bandwidth (again, paper specs):

  • Penryn: 8.53GB/s (DDR3 @ 1066MT/s)
  • Cortex-A72: 12.8GB/s (DDR4 @ 1600MT/s)

On the ray-tracing OCL test, though, things turned out interesting (MacBook running Apple’s own OCL stack, which, to the best of my knowledge, is also LLVM-based):

  • Penryn average FPS: 2.31
  • Cortex-A72 average FPS: 7.61

So while on the SGEMM test the ARM was ~1.5x faster than Penryn for cached data, on the ray-tracing test, which is a much more complex code than SGEMM, the ARM speedup turned out ~3x? Remember, we are talking of two μarchs that perform quite closely by general-purpose-code IPC. Could something be wrong with Apple’s OCL stack? Let’s try pocl (exact same version of pocl and LLVM as on ARM):

  • Penryn average FPS: 11.58

OK, that’s much more reasonable. This time Penryn holds a speed advantage of 1.5x. Now, while Penryn is a fairly mature μarch that has reached its toolchain-support peak long ago, could we expect improvements from LLVM’s (and pocl’s) support for the Cortex family? Perhaps. In the case of our little test I could even finish the Aarch64 port of the non-OCL version of this code (originally x86-64 with SSE/AVX), but hey, OCL saved me the initial effort for satisfying my curiosity!

[Update: See comment for new ARM Cortex A72 and A53 results after fixing some codegen issues]

What is more interesting, though, is that assuming a Qualcomm Falkor core is at least as performant as a Cortex-A72 core in both gen-purpose and NEON IPC (not a baseless supposition), and taking into account that the top specced Centriq 2400 has 12x the cores and 10x the RAM bandwidth of our ARM machine, we could speculate about Centriq 2400’s performance on this OCL test when using the same OCL stack.

Hypothetical Qualcomm Centriq 2400 server: Centriq 2400 48x Falkor @ 2200MHz-2600MHz, 6x DDR4 @ 2667MT/s (128GB/s)

Assumed linearly scaling from the ARMADA 8040 measured performance; in practice the single-thread part of the test will impede the linear scaling, and so could the slightly-lower per-core RAM BW paper specs.

Of course, CPU-based solutions are not the best candidate for this OCL test — a decent GPU would obliterate even a 2S Xeon server here. But the goal of this entire test was to get a first-encounter estimate of the Cortex-A72 for FP-heavy non-matrix-multiplication-trivial scenarios, and things can go only up from here. Raw data for POCL tests on MacchiatoBin and MacBook is available here.

Qualcomm Centriq 2400 ARM SoC Launched for Datacenters, Benchmarked against Intel Xeon SoCs

November 9th, 2017 12 comments

Qualcomm Centriq 2400 ARM Server-on-Chip has been four years in the making. The company announced sampling in Q4 2016 using 10nm FinFET process technology with the SoC featuring up to 48 Qualcomm Falkor ARMv8 CPU cores optimized for datacenter workloads. More recently, Qualcomm provided a few more details about the Falkor core, fully customized with a 64-bit only micro-architecture based on ARMv8 / Aarch64.

Finally, here it is as the SoC formally launched with the company announcing commercial shipments of Centriq 2400 SoCs.

Qualcom Centriq 2400 key features and specifications:

  • CPU – Up to 48 physical ARMv8 compliant 64-bit only Falkor cores @ 2.2 GHz (base frequency) / 2.6 GHz (peak frequency)
  • Cache – 64 KB L1 instructions cache with 24 KB single-cycle L0 cache, 512 KB L2 cache per duplex; 60 MB unified L3 cache; Cache QoS
  • Memory – 6 channels of DDR4 2667 MT/s  for up to 768 GB RAM; 128 GB/s peak aggregate bandwidth; inline memory bandwidth compression
  • Integrated Chipset – 32 PCIe Gen3 lanes with 6 PCIe controllers; low speed I/Os; management controller
  • Security – Root of trust, EL3 (TrustZone) and EL2 (hypervisor)
  • TDP – < 120W (~2.5 W per core)

Click to Enlarge

The SoC is ARM SBSA v3 compliant, meaning it can run any compliant operating systems without having to resort to “cute embedded nonsense hacks“. The processor if optimized for cloud workloads, and the company explains the SoC are already been used demonstrated for the following tasks:

  • Web front end with HipHop Virtual Machine
  • NoSQL databases including MongoDB, Varnish, Scylladb
  • Cloud orchestration and automation including Kubernetes, Docker, metal-as-a-service
  • Data analytics including Apache Spark
  • Deep learning inference
  • Network function virtualization
  • Video and image processing acceleration
  • Multi-core electronic design automation
  • High throughput compute bioinformatics
  • Neural class networks
  • OpenStack Platform
  • Scaleout Server SAN with NVMe
  • Server-based network offload

Three Qualcom Centriq 2400 SKUs are available today

  • Centriq 2434 – 40 cores @ 2.3 / 2.5 GHz; 50 MB L3 cache; 110W TDP
  • Centriq 2452 – 46 cores @ 2.2 / 2.6 GHz; 57.5 MB L3 cache; 120W TDP
  • Centriq 2460 – 48 cores @ 2.2 / 2.6 GHz; 60 MB L3 cache; 120W TDP

Qualcomm Centriq 2460 (48-cores) was compared to an Intel Xeon Platinum 8160 with 24-cores/48 threads (150 W) and found to perform a little better in both integer and floating point benchmarks.

The most important metrics for server SoCs are performance per thread, performance per watt, and performance per dollars, so Qualcomm pitted Centriq 2460, 2452 and 2434 against respectively Intel Xeon Platinum 8180 (28 cores/205W  TDP), Xeon Gold 6152 (22 cores/140W TDP), and Xeon Silver 4116 (12 cores/85W  TDP). Performance per watt was found to be significantly better for the Qualcomm chip when using SPECint_rate2006 benchmark.

Performance per dollars of the Qualcomm SoCs look excellent too, but…

Qualcomm took Xeon SoCs pricing from Intel’s ARK, and in the past prices there did not reflect the real selling price of the chip, at least for low power Apollo Lake / Cherry Trail processors.

This compares to the prices for Centriq 2434 ($880), Centriq 2452 ($1,373), and Centriq 2460 ($1,995).

Qualcomm also boasted better performance per mm2, and typical power consumption of Centriq 2460 under load of around 60W, well below the 120W TDP. Idle power consumption is around 8 watts using C1 mode, and under 4 Watts when all idle states are enabled.

If you are wary of company provided benchmarks, Cloudflare independently tested Qualcomm Centriq and Intel  Skylake/Broadwell servers using Openssl speed, compression algorithms (gzip, brotli…), Go, NGINX web server, and more.

Multicore OpenSSL Performance

Usually, Intel single core performance is better, but since ARM has more cores, multi-threaded performance is often better on ARM. Here’s their conclusion:

The engineering sample of Falkor we got certainly impressed me a lot. This is a huge step up from any previous attempt at ARM based servers. Certainly core for core, the Intel Skylake is far superior, but when you look at the system level the performance becomes very attractive.

The production version of the Centriq SoC will feature up to 48 Falkor cores, running at a frequency of up to 2.6GHz, for a potential additional 8% better performance.

Obviously the Skylake server we tested is not the flagship Platinum unit that has 28 cores, but those 28 cores come both with a big price and over 200W TDP, whereas we are interested in improving our bang for buck metric, and performance per watt.

Currently my main concern is weak Go language performance, but that is bound to improve quickly once ARM based servers start gaining some market share.

Both C and LuaJIT performance is very competitive, and in many cases outperforms the Skylake contender. In almost every benchmark Falkor shows itself as a worthy upgrade from Broadwell.

The largest win by far for Falkor is the low power consumption. Although it has a TDP of 120W, during my tests it never went above 89W (for the go benchmark). In comparison Skylake and Broadwell both went over 160W, while the TDP of the two CPUs is 170W.

Back to software support, the SoC is supported by a large ecosystem with technologies such as memcached, MongoDB, MySQL, …, cloud management solutions such as  Openstack and Kubernetes, programming languages (Java, Python, PHP, Node, Golang…), tools (GVV/ LLVM, GBD…), virtualization solution including KVM, Xen and Docker, as well as operating systems like Ubuntu, Redhat, Suse, and Centos.

Qualcomm is already working on its next generation SoC: Firetail based on Qualcomm Saphira core. But no details were provided yet.

Thanks to David for the links.

MINIX based Intel Management Engine Firmware & UEFI are Closed Source & Insecure, NERF to the Rescue!

November 7th, 2017 8 comments

You may have heard a few things about Intel Management Engine in recent months, especially as security issues have been found, the firmware is not easily upgradeable, and the EFF deemed it a security hazard asking Intel for ways to disable it.

In recent days, I’ve seen several media reports about the Management Engine being based on an Intel Quark x86-based 32-bit CPU running MINIX open-source operating system. Keep in mind, there’s nothing nefarious about MINIX, it’s just that Intel keeps its own developments on top closed. One of sources for the information is a blog post explaining how to disable Intel ME 11, but ZDNET also points to one of the talks at the Embedded Linux Conference Europe 2017 entitled “Replace Your Exploit-Ridden Firmware with Linux” by Ronald Minnich, Google which explains the problem, and proposes a solution to (almost) disable Intel’s ME, and replace UEFI by a small open source Linux kernel and ramdisk.

Click to Enlarge

To better understand about the issue, we’ll first need to talk about rings… yes… rings. Protection rings with numbers from -3 to +3 indicate the level of privileges with Ring 3 being the lowest priviledge, and Ring -3 giving full access to all hardware. It’s unrelated to OS privileges and normal users and root are are part of the same Ring 3 privileges.

Ring 0 to 3 are well documented, Ring -1 is for hypervisors like Zen so “known to mankind”, but while we know something about Ring -2 CPU, the code for the UEFI kernel and SMM half kernel are not always known, and in the case of Ring -3 kernels which include the Management Engine, Integrated Sensor Hub (ISH), and Innovation Engine (IE) we know very little about the hardware and software, despite them having the highest privilege.

We do know the Management Engine Ring -3 OS provides the following features.

Full Network manageability ICC Over Clocking
Regular Network manageability Protected Audio Video Path (PAVP)
Manageability IPV6
Small business technology KVM Remote Control (KVM)
Level III manageability Outbreak Containment Heuristic (OCH)
IntelR Anti-Theft (AT) Virtual LAN (VLAN)
IntelR Capability Licensing TLS
Service (CLS) Wireless LAN (WLAN)
IntelR Power Sharing Technology (MPC)

If you have no idea what some of the feature do, that’s OK, as even Richard is unclear. The important part is that the firmware has a full network stack, and web servers are running for remote management, which has recently become a serious problem as Intel found a vulnerability in “Intel Active Management Technology (AMT), Intel Standard Manageability (ISM), and Intel Small Business Technology” that can allow “an unprivileged attacker to gain control of the manageability features”.  It requires a firmware update, but considering this affect Intel’s first to seven generation, the bug is at least 9 years old, and most systems won’t be updated. Read this PDF for a detailed (71 pages) security evaluation of Intel’s ME

Beside the Management Engine, the presentation also goes through Ring -2 OS (UEFI), an extremely complex kernel (millions of lines of code) running on the main CPU, and whose security model is… obscurity. UEFI exploits also exist, and can be made permanent since UEFI can rewrite itself. The firmware also always runs, and exploits are undetectable by kernels and programs.

The solution proposed to address the privacy and security issues related to ME and UEFI is called NERF (Non-Extensible Reduce Firmware).

It’s only a partial solution because the system cannot fully boot without ME, but they’ve managed to reduce the size from 5MB to 300KB, removing the web server and IP stack in the process on Minnowboard MAX board thanks to me_cleaner. The SMM Ring -2 semi kernel can be disabled however.

UEFI stands for Unified Extensible Firmware Interface, so to simplify the code and make it less vulnerable to exploits they’ve made their implementation NON-extensible, and as a result it’s much simpler, as illustrated in the diagram below.

UEFI vs NERF

UEFI DXE stage is replaced with a single Linux kernel (tied to the BIOS vendor), and a 5.9MB firmware-based root file system written in Go (u-root.tk). If you are interesting in the details watching the 38 minutes ELCE 2017 presentation.

You may also be interested in the slides.

Arduino Create Adds Support for Linux Development Boards (based on Intel processors for now)

November 7th, 2017 No comments

Most people are used to program Arduino compatible boards with the Arduino IDE that they’ve installed in their Windows/Linux/Mac OS computer, and manage everything locally. But Arduino introduced Arduino Create last year, which includes Arduino Web Editor allowing you to perform the same tasks in your web browser, and save your files in the cloud.

The company has now added Linux support to Arduino Create so that users can now program their Linux devices as if they were regular Arduino boards, and easily deploy IoT applications with integrated cloud services. The initial release has been sponsored by Intel, and currently supports X86/X86_64 boards, but other hardware architectures will be supported in the coming month.

Click to Enlarge

In the meantime, AAEON UP2 board is the best platform to get started, as a complete getting started guide is available for the platform. But other mini PCs such as Intel NUC, Dell Wyse, Gigabyte GB-BXT are also supported, and you’ll find more generic instructions to get started.

Multiple Arduino programs can run simultaneously on a Linux devices and can communicate with each other thanks to MQTT based Arduino Connector. There are a currently three projects based on UP Squared board on the Project Hub, and if you need help, a dedicated forum has been launched.

Intel provided a few more details about the initiative in their announcement, highlighting the following points:

  • Reduce set up time with native integration of UP Squared Grove Development Kit with Arduino Create
  • Pre-installed custom Ubuntu Server 16.04 OS on the UP Squared Grove Development Kit
  • Simple getting started experience in Arduino Create for Intel based IoT platforms running Ubuntu on Intel Atom, Intel Core, or Intel Xeon processors.
  • Integrated libraries and SDKs such as UPM sensor libraries supporting over 400+ sensors, OpenCV, Intel Math Kernel Library, Amazon Web Services (AWS), Microsoft Azure, etc…
  • Supports the ability to run multiple sketches / programs at the same time
  • Export your sketch to a CMake project providing an easy development bridge to Intel System Studio 2018
  • Integrates mraa, the hardware abstraction layer by Intel, into the Arduino core libraries enabling support for all Intel platforms

Intel Optane 900P Series SSD Launched for Desktop PCs

October 29th, 2017 11 comments

Intel and Micron first unveiled 3D Xpoint technology (pronounced “crosspoint”) in 2015 with the promise of 1000x faster storage and 1000x better endurance than NAND flash used in SSDs. Performance was later reduced to about 7x better IOPS in a prototype, and Intel started to sell the technology under the Optane brand with the 375GB SSD DC P4800X for the enterprise market.

Since then Intel entered the consumer market with 16GB and 32GB Optane M.2 cards which are meant to be used as disk cache in compatible systems thanks to their high random I/O performance, and now the company has announced the first consumer grade 3D Xpoint SSDs for desktops and work stations with Optane 900P Series available in HHHL (CEM3.0) and U.2 15mm form factors, and with random I/O performance up to four times faster than competitive NAND-based SSDs.

Optane SSD 900P Series specifications:

  • Capacity – 280 to 480 GB
  • Interface – PCIe NVMe 3.0 x4
  • Form Factor – HHHL (CEM3.0) or 2.5″ U.2 15mm
  • Performance
    • Sequential Read/Write – Up to 2500/2000 MB/s
    • Random Read/Write – 550k/500k IOPS
  • R/W Latency  – 10μs
  • Power Consumption – Active: 14W; idle: 5W
  • Reliability / endurance
    • MTBF – 1.6 million hours
    • Endurance Rating – 8.76 PB written (480GB SSD); 5.11 PB written (280GB SSD)
    • Uncorrectable Bit Error Rate (UBER) – 1 sector per 10^17 bits read
  • Weight – HHHL: up to 230 grams; 2.5″ : up to 140 grams

Both 280 and 480GB models have the same performance and reliability characteristics, except for the endurance ratings, where obviously you can write more data on the large SSD before it wears out. With NAND based SSDs performance often scales with size, but it does not appear to be the case with 3D Xpoint SSDs.

Intel explains the Optane SSD are suitable for “demanding storage workloads, including 3D rendering, complex simulations, fast game load times and more”. Anandtech reviewed the 280GB model, and only recommends it for higher-end desktop computers for tasks where storage is too slow, or RAM is too small, but notes there aren’t that many of them on desktops.

While Optane memory (M.2 card) is only supported by some processors & motherboards right now, and the next generation of Intel processors and NUCs will all support Optane memory, except at the lower end of the scale with Gemini Lake processors, the Optane SSD 900P Series SSD should work with any PCIe/NVMe compatible host equipped with PCIe slot or U.2 connector.

Optane SSD 900P are pre-sold on Newegg for $389.99 (280GB) and $599.99 (480GB). You may find more details on Intel website.

Categories: Hardware Tags: 3d xpoint, intel, optane, ssd

Intel Speech Enabling Developer Kit Works with Alexa Voice Service, Raspberry Pi 3 Board

October 28th, 2017 4 comments

We’ve known Intel has been working on Quark S1000 “Sue Creek” processor for voice recognition for several months. S1000 SoC is based on two Tensilica LX6 with HiFi3 DSP, some speech recognition accelerators, and up to 8x microphones interfaces which allows it to perform speech recognition locally. The solution can also be hooked to an application processor via SPI, I2S and USB (optional) when cloud based voice recognition is needed.

Intel has recently introduced their Speech Enabling Developer Kit working with Amazon Alexa Voice Service (AVS) featuring a “dual DSP with inference engine” – which must be Quark S1000 – and an 8-mic array. The kit also includes a 40-pin cable to connect to the Raspberry Pi 3 board.

Click to Enlarge

Intel only provided basic specifications for the kit:

  • Intel’s dual DSP with inference engine
  • Intel 8-mic circular array
  • High-performance algorithms for acoustic echo cancellation, noise reduction, beamforming and custom wake word engine tuned to “Alexa”
  • 6x Washers
  • 3x 6mm screws
  • 3x 40mm female-female standoffs (x3)
  • Raspberry Pi connector cable

I could not find detailed information to get started, except for assembly guide shown in the video below. We do not that the kit will work with Amazon Alexa, and requires a few extra bits, namely a Raspberry Pi 3 board, an Ethernet cable, a HDMI cable and monitor, USB keyboard and mouse, an external speaker, a micro USB power supply (at least 5V/1A), and a micro SD card.

The video also points to Intel’s Smart Home page for more details about software, but again I could not find instructions or guide there,  except links to register to a developer workshop at Amazon Re:Invent in Las Vegas on November 30, 2017.

Intel Speech Enabling Developer Kit can be pre-ordered for $399 directly on Intel website with shipping planned for the end of November. The product is also listed on Amazon Developer page, but again with little specific information about the hardware and how to use it. One can assume the workflow should be similar to other AVS devkits.

Thanks to Mustafa for the tip.

Google Clips is an A.I. Camera Powered by Movidius Myriad 2 VPU

October 5th, 2017 No comments

Most consumer cameras offers some ways for the photographer to check the framing of the picture, such as a viewfinder or LCD display, before pressing the button. The first time I saw a consumer camera without such features was with MeCam, a tiny snap-on camera that you can wear on your shirt, and just press a button to take a picture. Convenient, but no ideal as subjects were often out of frame with the camera pointing at the wrong angle.

That was in 2013. But today, those cameras can be improved with artificial intelligence, and Google Clips is a camera without viewfinder nor LCD display that can allegedly take good photos – or short clips – automatically, acting in some ways like a human photographer, so that every human in the room / the whole family can be on the shot.

Google Clips specifications:

  • Vision Processing Unit – Movidius Myriad 2 VPU as found in Intel Movidus Neural Compute Stick
  • Storage – 16 GB for photos
  • Camera
    • TBD?? megapixels; 1.55μm pixels;  130° field of view; auto focus; auto low lux/night mode.
    • Motion photos (JPEGs with embedded MP4s) @ 15 fps, MP4, GIF, JPEG. No audio.
  • Connectivity – WiFi direct and Bluetooth LE
  • USB – 1x USB type C port for charging
  • Battery – Good for 3 hours of smart capture
  • Dimensions – 49 x 49 x 20 mm
  • Weight – 42.2 grams without clip, 60.5 grams with clip

The camera works with Google Clips app for “compatible mobile devices” running Android 7.0 Nougat or higher, such as Google Pixel, or Galaxy S7/S8, or iOS devices starting from iPhone 6. Google Clips will ship with a clip stand, a USB-C to USB-A cable, a quick start guide, and a user guide.

Google Clips will sell for $249, and if you’re interested you can join the waiting list on the product page.