Archive

Posts Tagged ‘suse’

Qualcomm Centriq 2400 ARM SoC Launched for Datacenters, Benchmarked against Intel Xeon SoCs

November 9th, 2017 13 comments

Qualcomm Centriq 2400 ARM Server-on-Chip has been four years in the making. The company announced sampling in Q4 2016 using 10nm FinFET process technology with the SoC featuring up to 48 Qualcomm Falkor ARMv8 CPU cores optimized for datacenter workloads. More recently, Qualcomm provided a few more details about the Falkor core, fully customized with a 64-bit only micro-architecture based on ARMv8 / Aarch64.

Finally, here it is as the SoC formally launched with the company announcing commercial shipments of Centriq 2400 SoCs.

Qualcom Centriq 2400 key features and specifications:

  • CPU – Up to 48 physical ARMv8 compliant 64-bit only Falkor cores @ 2.2 GHz (base frequency) / 2.6 GHz (peak frequency)
  • Cache – 64 KB L1 instructions cache with 24 KB single-cycle L0 cache, 512 KB L2 cache per duplex; 60 MB unified L3 cache; Cache QoS
  • Memory – 6 channels of DDR4 2667 MT/s  for up to 768 GB RAM; 128 GB/s peak aggregate bandwidth; inline memory bandwidth compression
  • Integrated Chipset – 32 PCIe Gen3 lanes with 6 PCIe controllers; low speed I/Os; management controller
  • Security – Root of trust, EL3 (TrustZone) and EL2 (hypervisor)
  • TDP – < 120W (~2.5 W per core)

Click to Enlarge

The SoC is ARM SBSA v3 compliant, meaning it can run any compliant operating systems without having to resort to “cute embedded nonsense hacks“. The processor if optimized for cloud workloads, and the company explains the SoC are already been used demonstrated for the following tasks:

  • Web front end with HipHop Virtual Machine
  • NoSQL databases including MongoDB, Varnish, Scylladb
  • Cloud orchestration and automation including Kubernetes, Docker, metal-as-a-service
  • Data analytics including Apache Spark
  • Deep learning inference
  • Network function virtualization
  • Video and image processing acceleration
  • Multi-core electronic design automation
  • High throughput compute bioinformatics
  • Neural class networks
  • OpenStack Platform
  • Scaleout Server SAN with NVMe
  • Server-based network offload

Three Qualcom Centriq 2400 SKUs are available today

  • Centriq 2434 – 40 cores @ 2.3 / 2.5 GHz; 50 MB L3 cache; 110W TDP
  • Centriq 2452 – 46 cores @ 2.2 / 2.6 GHz; 57.5 MB L3 cache; 120W TDP
  • Centriq 2460 – 48 cores @ 2.2 / 2.6 GHz; 60 MB L3 cache; 120W TDP

Qualcomm Centriq 2460 (48-cores) was compared to an Intel Xeon Platinum 8160 with 24-cores/48 threads (150 W) and found to perform a little better in both integer and floating point benchmarks.

The most important metrics for server SoCs are performance per thread, performance per watt, and performance per dollars, so Qualcomm pitted Centriq 2460, 2452 and 2434 against respectively Intel Xeon Platinum 8180 (28 cores/205W  TDP), Xeon Gold 6152 (22 cores/140W TDP), and Xeon Silver 4116 (12 cores/85W  TDP). Performance per watt was found to be significantly better for the Qualcomm chip when using SPECint_rate2006 benchmark.

Performance per dollars of the Qualcomm SoCs look excellent too, but…

Qualcomm took Xeon SoCs pricing from Intel’s ARK, and in the past prices there did not reflect the real selling price of the chip, at least for low power Apollo Lake / Cherry Trail processors.

This compares to the prices for Centriq 2434 ($880), Centriq 2452 ($1,373), and Centriq 2460 ($1,995).

Qualcomm also boasted better performance per mm2, and typical power consumption of Centriq 2460 under load of around 60W, well below the 120W TDP. Idle power consumption is around 8 watts using C1 mode, and under 4 Watts when all idle states are enabled.

If you are wary of company provided benchmarks, Cloudflare independently tested Qualcomm Centriq and Intel  Skylake/Broadwell servers using Openssl speed, compression algorithms (gzip, brotli…), Go, NGINX web server, and more.

Multicore OpenSSL Performance

Usually, Intel single core performance is better, but since ARM has more cores, multi-threaded performance is often better on ARM. Here’s their conclusion:

The engineering sample of Falkor we got certainly impressed me a lot. This is a huge step up from any previous attempt at ARM based servers. Certainly core for core, the Intel Skylake is far superior, but when you look at the system level the performance becomes very attractive.

The production version of the Centriq SoC will feature up to 48 Falkor cores, running at a frequency of up to 2.6GHz, for a potential additional 8% better performance.

Obviously the Skylake server we tested is not the flagship Platinum unit that has 28 cores, but those 28 cores come both with a big price and over 200W TDP, whereas we are interested in improving our bang for buck metric, and performance per watt.

Currently my main concern is weak Go language performance, but that is bound to improve quickly once ARM based servers start gaining some market share.

Both C and LuaJIT performance is very competitive, and in many cases outperforms the Skylake contender. In almost every benchmark Falkor shows itself as a worthy upgrade from Broadwell.

The largest win by far for Falkor is the low power consumption. Although it has a TDP of 120W, during my tests it never went above 89W (for the go benchmark). In comparison Skylake and Broadwell both went over 160W, while the TDP of the two CPUs is 170W.

Back to software support, the SoC is supported by a large ecosystem with technologies such as memcached, MongoDB, MySQL, …, cloud management solutions such as  Openstack and Kubernetes, programming languages (Java, Python, PHP, Node, Golang…), tools (GVV/ LLVM, GBD…), virtualization solution including KVM, Xen and Docker, as well as operating systems like Ubuntu, Redhat, Suse, and Centos.

Qualcomm is already working on its next generation SoC: Firetail based on Qualcomm Saphira core. But no details were provided yet.

Thanks to David for the links.

GIGABYTE MA10-ST0 Server Motherboard is Powered by Intel Atom C3958 “Denverton” 16-Core SoC

August 15th, 2017 27 comments

Last year, we wrote about Intel Atom C3000 series processor for micro-servers with the post also including some details about MA10-ST0 motherboard. GIGABYTE has finally launched the mini-ITX board with an unannounced Atom C3958 16-core Denverton processor.

Click to Enlarge

GIGABYTE MA10-ST0 server board specifications:

  • Processor –  Intel Atom C3958 16-core processor @ up to  2.0GHz with 16MB L2 cache (31W TDP)
  • System Memory – 4x DDR4 slots for dual channels memory @ 1866/2133/2400 MHz with up to 128GB ECC R-DIMM, up to 64GB for ECC/non-ECC UDIMM
  • Storage
    • 32GB eMMC flash
    • 4x Mini-SAS up to 16 x SATA 6Gb/s ports
    • 2x Mini-SAS ports are shared with PCIe x8 slot
  • Connectivity
    • 2x 10Gb/s SFP+ LAN ports
    • 2x 1Gb/s LAN ports (Intel I210-AT)
    • 1x 10/100/1000 management LAN
  • Video – VGA port up to 1920×[email protected] 32bpp; Aspeed AST2400 chipset with 2D Video Graphic Adapter with PCIe bus interface
  • USB – 2x USB 2.0 ports
  • Expansion Slots – 1x PCIe x8 (Gen3 x8 bus) slot; shared with Mini-SAS ports, Mini_CN2, Mini_CM3
  • Misc
    • 1x CPU fan header, 4x system fan headers
    • 1x TPM header with LPC interface
    • 1x Front panel header
    • 1x HDD back plane board header
    • 1x JTAG BMC header
    • 1x Clear CMOS jumper
    • 1x IPMB connector
    • 1x PMBus connector
    • 1x COM (RS-232)
    • Power and ID buttons with LEDs; status LED
  • Board Management – Aspeed AST2400 management controller; Avocent MergePoint IPMI 2.0 web interface
  • Power Supply – 1x 24-pin ATX main power connector; 1x 8-pin ATX 12V power connector
  • Dimensions –  170 x 170 mm (Mini-ITX form factor)
  • Temperature Range – 10 to 40°C
  • Relative Humidity – 8-80% (non-condensing)

The dual core Atom C3338 is the only processor listed on Intel’s formerly Denverton page, with now info about the 16-core Atom C3958 processor so far.

Click to Enlarge

The board is said to support Windows Server 2016, Red Hat Enterprise Linux Server 7.1, SuSE Linux Enterprise Server 12, Ubuntu 14.04.2 LTS, Fedora 22, and CentOS 7.1. The board is sold with an I/O shield and a quick start guide. There’s no word about pricing or availability on the product page, but Anandtech reports that the “board is essentially ready to go, and interested parties should get in contact with their local reps”. For reference, SuperMicro  A2SDI-H-TP4F-O board based on the same processor is sold for $820+ on Atacom.

SolidRun MACCHIATOBin Mini-ITX Networking Board is Now Available for $349 and Up

April 24th, 2017 32 comments

SolidRun MACCHIATOBin is a mini-ITX board powered by Marvell ARMADA 8040 quad core Cortex A72 processor @ up to 2.0 GHz and designed for networking and storage applications thanks to 10 Gbps, 2.5 Gbps, and 1 Gbps Ethernet interfaces, as well as three SATA port. The company is now taking order for the board (FCC waiver required) with price starting at $349 with 4GB RAM.

MACCHIATOBin board specifications:

  • SoC – ARMADA 8040 (88F8040) quad core Cortex A72 processor @ up to 2.0 GHz with accelerators (packet processor, security engine, DMA engines, XOR engines for RAID 5/6)
  • System Memory – 1x DDR4 DIMM with optional ECC and single/dual chip select support; up to 16GB RAM
  • Storage – 3x SATA 3.0 port, micro SD slot, SPI flash, eMMC flash
  • Connectivity – 2x 10Gbps Ethernet via copper or SFP, 2.5Gbps via SFP,  1x Gigabit Ethernet via copper
  • Expansion – 1x PCIe-x4 3.0 slot, Marvell TDM module header
  • USB – 1x USB 3.0 port, 2x USB 2.0 headers (internal),  1x USB-C port for Marvell Modular Chip (MoChi) interfaces (MCI)
  • Debugging – 20-pin connector for CPU JTAG debugger, 1x micro USB port for serial console, 2x UART headers
  • Misc – Battery for RTC, reset header, reset button, boot and frequency selection, fan header
  • Power Supply – 12V DC via power jack or ATX power supply
  • Dimensions – Mini-ITX form factor (170 mm x 170 mm)

Click to Enlarge

The board ships with either 4GB or 16GB DDR4 memory, a micro USB cable for debugging, 3 heatsinks, an optional 12V DC/110 or 220V AC power adapter, and an optional 8GB micro SD card. The company also offers a standard mini-ITX case for the board. The board supports mainline Linux or Linux 4.4.x, mainline U-Boot or U-Boot 2015.11, UEFI (Linaro UEFI tree), Yocto 2.1, SUSE Linux, netmap, DPDK, OpenDataPlane (ODP) and OpenFastPath. You’ll find software and hardware documentation in the Wiki.

The Wiki actually shows the board for $299 without any memory, but if you go to the order page, you can only order a version with 4GB RAM for $349, or one with 16GB RAM for $498 with the optional micro SD card and power adapter bringing the price up to $518.

Cavium introduces 54 cores 64-bit ARMv8 ThunderX2 SoC for Servers with 100GbE, SATA 3, PCIe Gen3 Interfaces

June 1st, 2016 5 comments

Cavium announced their first 64-bit ARM Server SoCs with the 48-core ThunderX at Computex 2014. Two years later, the company has now introduced the second generation, aptly named ThunderX2, with 54 64-bit ARM cores @ up to 3.0 GHz and promising two to three times more performance than the previous generation.

Cavium_ThunderX2

Key features of the new server processor include:

  • 2nd generation of full custom Cavium ARM core; Multi-Issue, Fully OOO; 2.4 to 2.8 GHz in normal mode, Up to 3 GHz in Turbo mode.
  • Up to 54 cores per socket delivering > 2-3X socket level performance compared to ThunderX
  • Cache – 64K I-Cache and 40K D-Cache, highly associative; 32MB shared Last Level Cache (LLC).
  • Single and dual socket configuration support using 2nd generation of Cavium Coherent Interconnect with > 2.5X coherent bandwidth compared to ThunderX
  • System Memory
    • 6x DDR4 memory controllers per socket supporting up to 3 TB RAM in dual socket configuration
    • Dual DIMM per memory controller, for a total of 12 DIMMs per socket.
    • Up to 3200MHz in 1 DPC and 2966MHz in 2 DPC configuration.
  • Full system virtualization for low latency from virtual machine to IO enabled through Cavium virtSOC technology
  • Next Generation IO
    • Integrated 10/25/40/50/100GbE network connectivity.
    • Multiple integrated SATAv3 interfaces.
    • Integrated PCIe Gen3 interfaces, x1, x4, x8 and x16 support.
  • Integrated Hardware Accelerators
    • OCTEON style packet parsing, shaping, lookup, QoS and forwarding.
    • Virtual Switch (vSwitch) offload.
    • Virtualization, storage and NITROX V security.
  • Manufacturing Process – 14 nm FinFET

Cavium_ThunderX2_SKUs

Just like for Cavium ThunderX, four revisions (SKUs) will be provided to match specific requirements, with all support 10/25/40/50/100GbE connectivity:

  • ThunderX2_CP for cloud compute workloads.  Used for private and public clouds, web serving, web caching, web search, commercial HPC workloads such as computational fluid dynamics (CFD) and reservoir modeling. This family also includes PCIe Gen3 interfaces, and accelerators for virtualization and vSwitch offload.
  • ThunderX2_ST for optimized for big data, cloud storage, massively parallel processing (MPP) databases and Data warehousing. This family supports multiple PCIe Gen3 interfaces, SATAv3 interfaces, and hardware accelerators for data protection, integrity, security, and efficient data movement.
  • ThunderX2_SC for optimized for secure web front-end, security appliances and cloud RAN type workloads. This family supports multiple PCIe Gen3 interfaces, as well as Cavium’s NITROX security technology with acceleration for IPSec, RSA and SSL.
  • ThunderX2_NT optimized for media servers, scale-out embedded applications and NFV type workloads. This family includes  OCTEON style hardware accelerators for packet parsing, shaping, lookup, QoS and forwarding.

The processor complies with Server Base Boot Requirements (SBBR), UEFI, ACPI support), and SBSA Level 2, and will support Ubuntu 16.04 LTS and later, Red Hat Early Access for ARMv8,  SUSE SLES SP2 and later, CentOS 7.2 and later, and FreeBSD 11.0 and later.

Charbax interviewed the company at Computex 2016 in the 20-minute video below, where you can also see Gigabyte G220-T60 server with ThunderX with an Nvidia Tesla GPU (at the 7:20 mark) for “high performance compute applications”, and other servers based on the first generation ThunderX SoC.

It could not find when the SoC will be available. More details can be found on Cavium ThunderX2 product page.

openSUSE 12.2 for ARM is Now Available for Beagleboard, Pandaboard, Efixa MX and More

November 7th, 2012 1 comment

The first stable release of openSUSE for ARM has just been announced. openSUSE 12.2 for ARM is officially available for the Beagleboard, Beagleboard xM, Pandaboard, Pandaboard ES, Versatile Express (QEMU) and the rootfs can be mounted with chroot, but “best effort’ ports have been made for Calxeda Highbank server, i.MX53 Loco development board, CuBox computer, Origen Board and Efika MX smart top.

Work is also apparently being done on a Raspberry Pi port which should be available for the next release.

openSUSE developers explains that almost all of openSUSE builds runs on these platforms (about 5000 packages). Visit “OpenSUSE on your ARM board” for download links and instructions for a specific ARM board. More details are available on the wiki page. openSUSE has limited resources for ARM development, so If you’d like to help with development (e.g. fixing builds), visit ARM distribution howto page to find out how to get involved.

Since I don’t own any of the supported boards, but still want to give it a try, I’ll use the chroot method in a virtual machine running Ubuntu 12.04. There are two images available:

  • JeOS (Just Enough Operating System) image for a minimal system  (openSUSE-12.2-ARM-JeOS-rootfs-*.tbz )
  • XFCE image for a graphical system (openSUSE-12.2-ARM-XFCE-rootfs-*.tbz)

Let’s go for the XFCE image (743 MB):

after installation, prepare the environment and run chroot:

We can now run some commands to show we run openSUSE (zypper is the equivalent of apt-get in SUSE):

There seems to be some problems with some repositories, but it basically works. I’ve tried to run startx, but it does not work within the chroot (probably because Xorg does not work in QEMU yet). It’s also possible to use the JeOS image (minimal) using QEMU emulating a Cortex A9 or A15 versatile express board.

Wyse Announced Two Linux Thin clients: Z50S and Z50D

August 31st, 2011 No comments

Wyse Technology announced that its fastest thin clients ever, the Z90D7  and Z90DW are now shipping at VMworld 2011.  You can have a look at AMD Embedded G-Series mini-PC, motherboard and thin client for details about the devices. In addition, Wyse also introduced two new Linux-based members of its Z class family – the Wyse Z50S and Wyse Z50D.  Both thin clients run Wyse Enhanced SUSE Linux Enterprise.

Wyse Thin Client- Suse LinuxThe press release also indicated that “the Z50 thin clients are built on the same exact advanced single and dual core processor hardware platform as the Wyse Z90 thin clients, the upcoming Linux-based Wyse Z50 promises more of the same industry leading power and capability on an enterprise-class Linux operating system”.

Wyse did not provide further details but based on the statement above, we can probably safely assume that Z50S will use the single core AMD G-T52R and the Z50D will be powered by the dual core AMD G-T56N and have the same (or very similar) specifications as Z90D7 and Z90DW that is:

Processor: AMD G-T52R 1.5GHz Processor with AMD Radeon™ HD 6310 Graphics OR
Dual core AMD G-T56N 1.6 GHz Processor with AMD Radeon™ HD 6310 Graphics
Memory: 2GB Flash / 1GB RAM
Expandable up to 32GB Flash / 4GB RAM
I/O peripheral support: One DisplayPort (Optional DisplayPort to DVI-I adapter available, sold separately)
One DVI-I port, DVI to VGA (DB-15) adapter included
Six total USB ports:
Four USB 2.0 ports (two front, two rear)
Two SuperSpeed USB 3.0 ports on rear (backwards compatible with USB 2.0)
Enhanced USB Keyboard with Windows Keys (104 keys) and PS/2 mouse port
PS/2 Optical mouse includedFactory options:
Legacy connectivity – adds 2 serial ports, 1 parallel port and 1 PS/2 port
Networking: 10/100/1000 Gigabit EthernetFactory options:
Internal 802.11 A/B/G and dual-band N wireless
Bluetooth 2.1+ EDR
Fiber NIC network connectivity (available Q2 2011)
Display: VESA monitor support with Display Data Control (DDC) for automatic setting of resolution and refresh rate
DisplayPort: 2560×[email protected]
DVI-I: 1920×[email protected]
Dual display: 1920×[email protected]
Audio: Output: 1/8-inch mini jack, full 16 bit stereo, 48KHz sample rate, Digital audio out, internal mono speaker
Input: 1/8-inch mini jack, 8 bit stereo microphone

Bootloader to OS with Unified Extensible Firmware Interface (UEFI)

August 22nd, 2011 No comments

Unified Extensible Firmware Interface (UEFI) is a specification detailing an interface that helps hand off control of the system for the pre-boot environment (i.e.: after the system is powered on, but before the operating system starts) to an operating system, such as Windows or Linux. UEFI aims to provides a clean interface between operating systems and platform firmware at boot time, and supports an architecture-independent mechanism for initializing add-in cards.

UEFI will overtime replace vendor-specific BIOS. It also allows for fast boot and support for large hard drives (> 2.2 TB).

There are several documents fully defining the UEFI Specification, API and testing requirements:

  1. The UEFI Specification (version 2.3.1) describes an interface between the operating system (OS) and the platform firmware. It describes the requirements for the following components, services and protocols:
    Boot Manager Protocols – Compression Algorithm Specification
    EFI System Table Protocols – ACPI Protocols
    GUID Partition Table (GPT) Disk Layout EFI Byte Code Virtual Machine
    Services — Boot Services Network Protocols – SNP, PXE and BIS
    Services — Runtime Services Network Protocols — Managed Network
    Protocols — EFI Loaded Image Network Protocols — VLAN and EAP
    Protocols — Device Path Protocol Network Protocols —TCP, IP, IPsec, FTP and Configurations
    Protocols — UEFI Driver Model Network Protocols – ARP and DHCP
    Protocols — Console Support Network Protocols — UDP and MTFTP
    Protocols – Media Access Security – Secure Boot, Driver Signing and Hash
    Protocols – PCI Bus Support Human Interface Infrastructure Overview
    Protocols — SCSI Driver Models and Bus Support HII Protocols
    Protocols – iSCSI Boot HII Configuration Processing and Browser Protocol
    Protocols — USB Support User Identification
    Protocols – Debugger Support Firmware Management Protocol
  2. The UEFI Platform Initialization Specification (version 1.2) is composed of 5 documents which defines the pre EFI initialization core interface, the driver execution environment core interface,  the shared architectural elements, the system management mode core interface and the standards to be used with the Platform Initialization (PI) specifications.
  3. The UEFI Shell Specification (Version 2.0) provides an API, a command prompt and a rich set of
    commands that extend and enhance the UEFI Shell’s capability.
  4. The UEFI Platform Initialization Distribution Packaging Specification (Version 1.0)  defines the overall architecture and external interfaces that are required for distribution of UEFI/PI source and binary files.
  5. The UEFI Self Certification Test (SCT) Package provides the tools to allow companies or individuals who implement UEFI to test their products and services. It contains 6 files:
    • ReleaseNote.txt – Release note for the entire package
    • UefiSctEdkII-Dev.zip – UEFI SCT agent source package.
    • Ems-Dev.zip – Ems source package.
    • IHV-SCT_Binary.zip – Toolset for the Independent Hardware Vendors(IHV) to validate UEFI implementations on IA32, Itanium Processor Family (IPF) and EM64T based platforms for compliance to the UEFI Specification.
    • UEFI-SCT_Binary.zip – Toolset with two usage models: One is native mode which is invoked as an EFI application SCT from local EFI Shell, the other is passive mode which executes
      UEFI SCT Agent in EFI Shell and runs all test cases on UEFI Management side (EMS).
    • SCT_1_1_Specs.zip –  SCT Specifications 1.1

This is a rather complex specification, as all documents comprise several thousand pages.

UEFI: Bootloader to Operating System Interface

Unified Extensible Firmware Interface's position in the software stack.

Current UEFI Implementations and Resources.

Many (Most?) operating systems are now UEFI compliant. For example, GRUB2 and Linux follow the UEFI specifications. Here’s a non-exhaustive list of UEFI compliant products: Windows 7, Windows Server 2008 R2, Ubuntu (10.04 LTS and greater), Meego, Redhat, Suse, Fedora, VMWare etc…

Tianocore is an open source implementation of UEFI specification and provides the “Intel EDK II Application Development Kit for including the Standard C Libraries in UEFI Shell Applications”.

Linaro (Linux for ARM)  also considers work on  an UEFI implementation, as it is apparently very important for ARM servers and virtualization companies (such as Citrix).  For details about Linaro UEFI implementation, please visit: https://blueprints.launchpad.net/linaro/+spec/linaro-kernel-o-uefi

For further information, Intel also recommended those two books in one of their presentations: