Short Demo with 96Boards SynQuacer 64-bit ARM Developer Box

Even if you are working on ARM platforms,  you are still likely using an Intel or AMD x86 build machine, since there’s not really a good alternative in the ARM world. Linaro talked about plans to change that at Linaro Connect Budapest 2017 in March, and a few days ago, GIGABYTE SynQuacer software development platform was unveiled with a Socionext SynQuacer SC2A11 24-core Cortex-A53 processor, and everything you’d expect from a PC tower with compartment for SATA drives, PCIe slots, memory slots, multiple USB 3.0 ports, and so on.

Click to Enlarge

The platform was just demonstrated a Linaro Connect San Francisco right after Linaro High Performance Computing keynotes by Kanta Vekaria, Technology Strategist, Linaro, and Yasuo Nishiguchi, Socionext’s Chairman & CEO.

If you have never used a system with more than 14 cores, you’d sadly learn that the tux logos at boot times will only be shown on the first line, skipping the remaining 10 cores, of the 24-core system. It was hard to stomach, but I’m recovering… 🙂

The demo showed a system with an NVIDIA graphics card connected to the PCIe x16 port and leveraging Nouveau open drivers, but it’s also possible to use it as an headless “developer box”. The demo system booted quickly into Debian + Linux 4.13. They then played a YouTube video, and ran top in the developer box showing all 24-cores and 32GB RAM. That’s it. They also took questions from the audience. We learned that the system can build the Linux kernel in less than 10 minutes, they are working on SBSA compliance, and the system will be available through 96Boards website, with a complete build with memory and storage expected to cost less than $1,000. The idea is to use any off-the-shelves peripherals typically found in x86 PC towers. We still don’t know if they take MasterCard though… The video below is the full keynote with the demo starting at the 52:30 mark.

Support CNX Software - Donate via PayPal or become a Patron on Patreon

17
Leave a Reply

avatar
17 Comment threads
0 Thread replies
6 Followers
 
Most reacted comment
Hottest comment thread
8 Comment authors
miniNodestkaisertheguyukcnxsoftblu Recent comment authors
  Subscribe  
newest oldest most voted
Notify of
lvrp16
Guest
lvrp16

24 x 1GHz = 24GHz
24 GHz / 3 (Coffee Lake vs Cortex A53 IPC) = 8GHz
8GHz / 4 Cores (Intel i3-8100) = Performance Equivalent of Intel I3-8100 running at 2GHz
The i3 will use ~20W at 2GHz so the SC2A11 better use < 10W at the same performance level.

bobby
Guest
bobby

“showing all 24-cores and 32MB RAM”
GB RAM I guess.

FransM
Guest
FransM

@lvrp16

Interesting math….
You are comparing apples and oranges.

Actually clock speed is just one factor in the equation.
It also depends on how many clocks an instruction takes, how much pipelining, how powerful the instructions are etc etc.
(apart from other things like memory speed and timing, caching etc etc etc)

Thinking of it a bit more: if your 24 cores are going to compete for the same memory location this will also slow down the system. Also if your algorithm/program is not written for parallel execution you’re also hardly going to benefit from the additional cores.

blu
Guest
blu

@FransM
Nothing unusual in lvrp16’s calculations. Welcome to the world of ebarrassingly-parallel computations.

All the performance factors you mentioned are factor in the IPC (instuctions per clock), which is usually task-dependent, but overall is indeed in the order of 2-3x better on a modern OoO CPU vs a modern in-order CPU.

@lvrp16
Power budget is decisively in favor of the SC2A11 – it’s a 5W part (or so they say).

blu
Guest
blu

@cnxsoft
Curiously enough, we already have one such platform — it’s sitting on my workbench at home ; ) Yes, I’m referring to Marvell’s macchiatobin. Why Linaro pretend that platform does not exist is beyond me, given it too can have 32GB of RAM and it too builds the kernel in ~10 min.

Most importantly, it does not cost a grand.

theguyuk
Guest
theguyuk

@lvrp16
I suggest you are misleading yoursef with the maths. 24 x 1GHz = 24GHz in the real world you get 24 cores that max out at 1GHz each. You do not get 24GHz. As a example a Allwinner H3 has 4 cores at 1.296GHz yet it does not run at 5.184GHz. Also as stated for full benefit the software running needs to use all cores, else some sit idle, if the CPU cannot break down the task smaller.

tkaiser
Guest
tkaiser

@cnxsoft
The Socionext 24 core SoC according to one of your blog posts has ‘PCI Express Gen2, Root/Endpoint select, 4 lanes (2 systems/ for SoC IF)’, the above motherboard exposes 2 x PCIe x1, 2 USB3 ports using a µPD720201 (which is AFAIK PCIe attached) and there’s a mechanical x16 PCIe slot. If there’s no PCIe multiplexer somewhere on this board I wonder how there should be more than 1 PCIe lane usable in the x16 slot?

The MacchiatoBin exposes a PCIe 3 x4 slot which should be usable with PCIe GPUs using a 90 degree riser card.

When looking at single lane PCIe 2.x vs. x4 PCIe 3 (just ten times the theoretical bandwidth) at least I would give the MacchiatoBin a try first as desktop machine generously ignoring all the networking goodies 😉

tkaiser
Guest
tkaiser

@cnxsoft
Hmm… by looking at the PCB picture SATA is clearly provided by an ASM1061 or ASM1062 so also PCIe attached. Then I either did not understand the PCIe specs of the SC2A11 or there must be a PCIe multiplexer somewhere used.

ASM106x SATA performance is nothing I would call ‘stellar’, at least it’s way slower (especially random IO) than all the ‘native’ SATA 3.0 ports the Armada 8040 provides.

miniNodes
Guest

@cnxsoft – Great find on that link, thanks! We have been struggling testing a GPU’s on a different RTD1296 demo board, using a PCIe E-key to x1 adapter card. This provides some insight.

blu
Guest
blu

@cnxsoft
Great find! Apparently my macchiato will remain headless for the foreseeable future.

tkaiser
Guest
tkaiser

@cnxsoft
Hehe, it seems you’re right stating that the above box as ARM development platform is suitable to find/develop bugs/quirks that are long fixed on x86. An obvious web search came up with a similar (or same?) issue: patchwork.kernel.org/patch/9661993/

lvrp16
Guest
lvrp16

@theguyuk
The calculation is for total estimated throughput.

@blu
Lets hope the manufacturer numbers are right. If a newer design with the recently release Cortex-A55, it would make a lot more sense in the market.

blu
Guest
blu

@lvrp16
A55 would undoubtedly be better, considering it might even help them address some of their more dubious design decisions (e.g. 8 clusters of 3 cores each?). But these are future considerations, for the current timeframe A55 is too new.

@cnxsoft
Thanks again! Fingers crossed.