## Small, Quiet, and Cool Spanning the range of computing from mobile, to microcontrollers, through to data plane with Cortex-A5 Raymond Yao Product Marketing Manager Processor Division 4Q 2010 ## Cortex-A5: What's unique about it? #### Small: Lowest area (cost) applications processor with internet capability #### **Quiet and Cool:** Most energy efficient applications processor with internet capability #### **Cortex-A5** is a Cortex-A Processor Cortex-A processors feature virtual memory management for running advanced OS such as Linux, Android, Windows CE. ARM Web applications will mostly be Cortex-A (ARMv7/NEON) Example: Firefox; Adobe Flash and Air ## Cortex-A5 provides... #### **Cortex-A5 in Mobile** 2012 Mobile Market # 2012-13 Entry Smartphones & Low cost feature phones ~80% of the market - Must be low-cost, yet deliver performance of 2010 smartphones - Cortex-A5: 1/3 Area & Power of A9 - ARMv7-A (Cortex-A8, A9) compatible #### **Mobile audio** - Leverages ARM and NEON software - Software solution in 1-2mW or less - Offload tasks from main CPU ### Cortex-A5: Enabling the \$100 Smartphone - The most efficient low-cost application processor - Full Internet connectivity and software compatibility - Scalable performance, scalable power - Enabling 1Bn+ smartphones ## **Cortex-A5 in Set-top Box** ## Cortex-A5 in Entry Level STB/DTV/DTA - Low power, low cost Cortex-A5 processor bringing today's high end performance into the entry level products of 2013 - Very low standby/active power, less heat dissipation - Improved Linux/Windows CE performance over ARM926 - Physically tagged caches remove OS cache clean - TrustZone for piracy/content protection - Scalable multiprocessor solution - 1 ~ 4 Cores - Strong Software Ecosystem Support - Android, HTML5, Flash10.1, etc. - Leverage success of Cortex-A8/A9 with full reuse of ecosystem ## Cortex-A5 in Entry Level STB/DTV/DTA #### **Device Features** - 700 MHz ~ 1GHz Cortex-A5 (40G) - Performance comparable to or higher than Cortex-A8 600MHz - Mali 200 GPU - 64MB Flash / 256MB RAM - 1Gbps Ethernet - 802.11g Wi-Fi (optional) - IR Receiver - <5w Typical Power Usage</p> #### Multimedia - 1080p Playback - 3D: OPEN GLES 2.0 Gfx - MPEG-4 MP@L3 - Full DLNA Compliant - HDMI + SPDIF Output #### Web Experience - Full browser - Flash Player 10 support or Flash Light 4 support - Full HTML 5 support #### **Applications** - Choice of Open Platforms - HTML5, Flash 10, QT, Android - Access to Application Stores - Access to Primary STB / Gateway - Remote Desktop services - Social Networking & Photos - Over-The-Top Content Cortex A5 designed to enable Internet TV / 2<sup>nd</sup> TV Markets ## **Cortex-A5 in MCU** #### **Cortex-A5 in MCU** - Some MCU applications require cache and MMU, e.g. for full OS support – good fit for Cortex-A5 - Small area allows Cortex-A5 to be manufactured cost efficiently in larger geometries - Mixed analog designs often used in older processes, e.g. 130, 90nm - Cortex-A5 logic area similar to ARM9, realizable in older geometries - High performance MCUs require higher frequencies - Cortex-A5 supports 600MHz+ operation - AXI couples with high speed DDR memories - Compare with typical MCUs that pair up with Flash memory, limiting device speed to ~100MHz - NEON unit allows limited onboard DSP capability ## Cortex-A5: low cost internet everywhere ## Small size enables latest Linux/Android/Win CE for extremely cost/power sensitive applications - General purpose MPUs - Smart energy meters - Low cost printers - STB audio systems - Digital picture frames ## A5 Neon unit and single cycle multiply are good fit for DSP in MCU Up to 3x faster than Cortex-M4 DSP functions, clock for clock ## **Cortex-A5 in Data Plane** ## **Data Plane Processing** - Data plane processors inspect, forward, and process packets - Number of services always expanding - Mobile platforms example: voice, maps, video, audio, networked game, mail client, SMS, etc. - Each packet can be seen as a separate thread parallelizable workload - Frequently running in same OS stack as applications processor (e.g. Cortex-A9, Eagle, Atlas) - Modest single-thread performance requirements A data plane processor should be an **efficient** processor and have **architectural compatibility** with the applications processor ## Data Plane: Mobile system example In current systems this already takes up a significant amount of the performance of A8/A9 processor, e.g. Cortex-A8/Cortex-A9 Cortex-A8 and A9 are inefficient cores for packet processing ## **Data Plane Processing in Mobile** - Data has dominated cellular network traffic for several years - Increasing numbers of apps all consuming data - 4G is moving to all IP with data rates of up to 150Mb/s - Must ensure quality of service for individual applications within phone - Increasing need to shape traffic - Traffic prioritization - Application-specific VPN - Pattern matching - Traffic classification - Mobile OSs are already spending time doing data plane functionality - Data plane processing is becoming critical in phones ## 4G Mobile system example - Packet processing (layers 2-4) done on more efficient cores - Dual or Quad A5 - Apps processor (e.g. A9, A15) has more performance headroom - Low-power mode with only data plane CPU(s) turned on enables streaming media without waking up apps cores ## Residential Gateway System Example ARM Cortex A5 Dynamic Scaling 1 to 4 cores active: 80mW ~ 330mW #### Data Plane: Cortex-A5 MP4 vs. Cortex-A9 MP2 #### Throughput/mm2 and /mW | Throughput / mm <sup>2</sup> | A5MP4 vs A9MP2 | |------------------------------------|----------------| | EEMBCv1 / mm <sup>2</sup> | 1.5x | | EEMBCv2 / mm <sup>2</sup> | 1.4x | | TCPMark/ mm <sup>2</sup> (EEMBCv2) | 1.01x | | IPMark / mm <sup>2</sup> (EEMBCv2) | 1.5x | | Throughput / mW | A5MP4 vs A9MP2 | |------------------------|----------------| | EEMBCv1 / mW | 2.7 | | EEMBCv2 / mW | 2.6 | | TCPMark / mW (EEMBCv2) | 1.88 | | IPMark / mW (EEMBCv2) | 2.73 | - Cortex-A5 MP Quad core more efficient at networking per unit power and area - Compared to Cortex-A9 MP Dual core - 1.5x better performance/area - 2.6x better performance/mW - Similar overall area as Cortex-A9MPx2 - ~15% lower peak frequency PPA numbers based on no Neon, FPU configuration. EEMBC results are in iterations of the given workload per second. Performance on multiple cores assumed to scaled linearly, e.g. 2 core throughput is 2x 1 core throughput Throughput/mW estimated at 950MHz, 40G, Rvt only, no Neon unit. Power numbers are estimated using Dhrystone power consumption. ## **Cortex-A5 uniProcessor Summary** #### Most power efficient Cortex-A core - Small (~ARM926 power/area) - 1.58 DMIPS/MHz (> ARM11 performance) - Yet adds Cortex-A class ability - Thumb2, NEON, TrustZone - High performance memory bus and TLBs #### Hardmacro implementations end of 2010 TSMC 40LP #### Highly configurable - Optional NEON / FPU - I 2 cache 128KB 8MB Availability: Released #### Cortex<sup>™</sup>-A5 **Memory Management** Unit **ETM** ARMv7-A Core **NEON I&D Trace** ARM ISA SIMD engine Thumb2 ISA **FPU** TrustZone Debug Single + double Jazelle **Data Watchpoints** precision float Instr Breakpoints 4-64K 4-64K ICache DCache 64-bit AXI Bus Interface | Estimated PPA | TSMC 40LP<br>(Trial) 1.1V | TSMC 40LP<br>(Hard Macro) 1.1V | |-----------------------------|----------------------------------------------------------------|-----------------------------------------------------------------| | Configuration | uP, no NEON, no FPU, 2x32K<br>L1, 12T, RVt, fast mem, perf opt | uP, w NEON + FPU, ETM, 2x32K<br>L1, 12T, RVt, FCI mem, perf opt | | Frequency (MHz) | 532 | 600 | | Performance (Agg. DMIPS) | 841 | 948 | | Total area (mm²) | 0.59 | 0.59 | | Power efficiency (DMIPS/mW) | 12 | 11 | 50ps clock jitter, +/-3% duty cycle, 10% OCV and 100ps hold margin, reworst parasitics ## **Cortex-A5 MPCore Scalability** ## Cortex-A5 MP – up to 4 coherent Cortex-A5s #### Includes: - Snoop Control Unit (SCU) for coherency - Interrupt controller - Timers - Accelerator coherency port - 2<sup>nd</sup> AXI port #### **Cortex-A5MP** ## **Cortex-A5 MPCore Summary** #### Most power efficient Cortex-A core - Small (~ARM926 power/area) - 1.58 DMIPS/MHz (> ARM11 performance) - Yet adds Cortex-A class ability - Thumb2, NEON, TrustZone - High performance memory bus and TLBs #### Hardmacro implementations end of 2010 - TSMC 40LP - Uniprocessor and 2xMP versions #### Highly configurable - 1-4 cores - Optional NEON / FPU - L2 cache 128KB 8MB - ACP for coherent I/O - Availability: Released #### **Cortex-A5 MPCore** | Estimated PPA | TSMC 40LP<br>(Trialed) | TSMC 40G<br>(Estimated) | |-----------------------------|------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------| | Configuration | 2x CPU, NEON, FPU, 2x32K L1,<br>64 IRQ, ACP, dual-AXI, ETM,<br>PL310, 12T, RVt, fast mem | 2x CPU, NEON, FPU, 2x32K L1,<br>64 IRQ, ACP, dual-AXI, PTM,<br>PL310,12T, RVt, fast mem | | Frequency (MHz) | 500 ~ 575 | 950 | | Performance (Agg. DMIPS) | 1570 ~ 1820 | 3000 | | Total area (mm²) | 2.4 | 2.4 | | Power efficiency (DMIPS/mW) | 11 | 18.4 | Results include 10% OCV and 50ps jitter. No overdrive. ## **Cortex-A5 Delivery** #### Cortex-A5 is delivered as fully synthesizable Verilog #### Cortex-A5 licensing options - Cortex-A5 Uniprocessor - Cortex-A5 Multiprocessor - Cortex-A5 Floating Point Unit - Cortex-A5 NEON Unit - Cortex-A5 Embedded Trace Unit Available Now Hard Macros coming soon #### **Cortex-A5 Schedule** #### **Maturity Release in 3Q10** 1<sup>st</sup> silicon received in July #### **ARM TestChip in Development** - Based eASIC technology - 2xMP, NEON - 32K/32K/256K - Includes local SRAM - Delivery 4Q10 ## **END** ## **Thank You!** Questions, contact raymond.yao@arm.com