ARM (and its partners) were arguably one of the major causes of the present day smartphone revolution. While AMD and Intel focused on using Moore’s Law to drive higher and higher performing CPUs, ARM and its partners used the same physics to drive integration and lower power. The result was ultimately the ARM11 and Cortex A-series CPU cores that began the revolution and continue to power many smartphones today. With hopes of history repeating itself, ARM is just as focused on building an even smaller, even lower power family of CPU cores under the Cortex M brand.

We’ve talked about ARM’s three major families of CPU cores before: Cortex A (applications processors), Cortex R (real-time processors) and Cortex M (embedded/microcontrollers). Although Cortex A is what we mostly talk about, Cortex M is becoming increasingly important as compute is added to more types of devices.

Wearables are an obvious fit for Cortex M, yet the initial launch of Android Wear devices bucked the trend and implemented Cortex A based SoCs. A big part of that is likely due to the fact that the initial market for an Android Wear device is limited, and thus a custom designed SoC is tough to justify from a financial standpoint (not to mention the hardware requirements of running Android outpace what a Cortex M can offer). Looking a bit earlier in wearable history and you’ll find a good number of Cortex M based designs including the FitBit Force and the Pebble Steel. I figured it’s time to put the Cortex M’s architecture, performance and die area in perspective.

We’re very much in the early days of the evolution of Cortex M. The family itself has five very small members: M0, M0+, M1, M3 and M4. For the purposes of this article we’ll be focusing on everything but Cortex M1. The M1 is quite similar to the M0 but focuses more on FPGA designs.

Before we get too far down the architecture rabbit hole it’s important to provide some perspective. At a tech day earlier this year, ARM presented this data showing Cortex M die area:

By comparison, a 40nm Cortex A9 core would be roughly around 2.5mm^2 range or a single core. ARM originally claimed the Cortex A7 would be around 1/3 - 1/2 of the area of a Cortex A8, and the Cortex A9 is roughly equivalent to the Cortex A8 in terms of die area, putting a Cortex A7 at 0.83mm^2 - 1.25mm^2. In any case, with Cortex M we’re talking about an order of magnitude smaller CPU core sizes.

The Cortex M0 in particular is small enough that SoC designers may end up sprinkling in multiple M0 cores in case they need the functionality later on. With the Cortex M0+ we’re talking about less than a hundredth of a square millimeter in die area, even the tightest budgets can afford a few of these cores.

In fact, entire SoCs based on Cortex M CPU cores can be the size of a single Cortex A core. ARM provided this shot of a Freescale Cortex M0+ design in the dimple of a golf ball:

ARM wouldn’t provide me with comparative power metrics for Cortex M vs. Cortex A series parts, but we do have a general idea about performance:

Estimated Core Performance
  ARM Cortex M0/M0+ ARM Cortex M3/M4 ARM11 ARM Cortex A7 ARM Cortex A9 Qualcomm Krait 200
DMIPS/MHz 0.84/0.94 1.25 1.25 1.9 2.5 3.3

In terms of DMIPS/MHz, Cortex M parts can actually approach some pretty decent numbers. A Cortex M4 can offer similar DMIPS/MHz to an ARM11 (an admittedly poor indicator of overall performance). The real performance differences come into play when you look at shipping frequencies, as well as the type of memory interface built around the CPU. Cortex M designs tend to be largely SRAM and NAND based, with no actual DRAM. You'll note that the M3/M4 per clock performance is identical, that's because the bulk of what the M4 adds is in the form of other hardware instructions not measured by Dhrystone performance.

Instruction set compatibility varies depending on the Cortex M model we’re talking about. The M0 and M0+ both implement ARM’s v6-M instruction profile, while the M3 and M4 support ARM’s v7-M. As you go up the family in terms of performance you get access to more instructions (M3 adds hardware divide, M4 adds DSP and FP instructions):

Each Cortex M chip offers a superset of the previous model’s instructions. So a Cortex M3 should theoretically be able to execute code for a Cortex M0+ (but not necessarily vice versa).

You also get support for more interrupts the higher up you go on the Cortex M ladder. The Cortex M0/M0+ designs support up to 32 interrupts, but if you move up to the M3/M4 you get up to 240.

All Cortex M processors have 32-bit memory addressability and the exact same memory map across all designs. ARM’s goal with these chips is to make moving up between designs as painless as possible.

While we’ve spent the past few years moving to out-of-order designs in smartphone CPUs, the entire Cortex M family is made up of very simple, in-order architectures. The pipelines themselves are similarly simplified:

Cortex M0, M3 and M4 all feature 3-stage in-order pipelines, while the M0+ shaves off a stage of the design. In the 3-stage designs there’s an instruction fetch, instruction decode and a single instruction execute stage. In the event the decoder encounters a branch instruction, there’s a speculative instruction fetch that grabs the instruction at the branch target. This way regardless of whether or not the branch is taken, the next instruction is waiting with at most a 1 cycle delay.

These aren’t superscalar designs, there’s only a 1-wide path for instruction flow down the pipeline and not many execution units to exploit. The Cortex M3 and M4 add some more sophisticated units (hardware integer divide in M3, MAC and limited SIMD in M4), but by and large these are simple cores for simple needs.

The range of operating frequencies for these cores is relatively low. ARM typically expects to see Cortex M designs in the 20 - 150MHz range, but the cores are capable of scaling as high as 800MHz (or more) depending on process node. There’s a corresponding increase in power consumption as well, which is why we normally see lower clocked Cortex M designs.

Similar to the Cortex A and R lines, the Cortex M family has a roadmap ahead of it. ARM recently announced a new CPU design center in Taiwan, where Cortex M based cores will be designed. I view the Cortex M line today quite similarly to the early days of the Cortex A family. There’s likely room for a higher performing option in between Cortex M4 and Cortex A7. If/when we get such a thing I feel like we may see the CPU building block necessary for higher performance wearable computing.

Comments Locked

14 Comments

View All Comments

  • bmastenbrook - Monday, August 18, 2014 - link

    I feel like this article is missing some important context. The Cortex M series is an embedded microcontroller without a memory management unit (MMU). This means that it does not run standard Linux, which is why the Android Wear devices do not use a Cortex M. Android Wear is a heavyweight platform that uses hundreds of megabytes of RAM and requires the horsepower and services of a full CPU. Even a custom designed SoC for Android Wear would use a Cortex A-series part for this reason.

    There also already is a higher performing option between the M4 and the A7 as well - the Cortex A5, which offers the best performance per watt in the current ARM lineup. I am sure there will be higher performing Cortex M series parts in the future, but usually applications with greater horsepower needs also require more memory, and can take better advantage of a higher performance Cortex A-series core. Once you have more DRAM at your disposal, you also don't need integrated flash (a common feature of Cortex M parts).

    As a microcontroller platform, there are substantially more Cortex M devices out there than Cortex A devices. Your smartphone might have one or two just providing auxiliary functions. Chances are you have several other electronic devices that have a Cortex M or three inside powering basic functions, especially now with Cortex M0 intruding on the 8-bit microcontrollers' turf (think AVR and 8051). And of course the Internet of Things is largely made up of Cortex M parts.
  • 1008anan - Thursday, August 21, 2014 - link

    bmastenbrook - Monday, August 18, 2014 - link

    Why do you assert that the Cortex A5 has a higher performance per watt than Cortex A7?

    The higher performing option between the A4 and A7 is called Cortex R7. Not to mention that widely expected next generation Cortex R series processor that should be released soon.

    I expect that much of the internet of things market will be addressed by the Cortex R series and future X86 compatible micro-architectures released under the "Quark" brand.

    When do you foresee an M processor (M5 maybe?) sporting a memory management unit that can run standard Linux?
  • HardwareDufus - Friday, August 22, 2014 - link

    Cortex M3 is used in the Arduino Due.., Cortex M0+ is used in the recently announced Arduino Zero. These are Atmel chips that use ARM ip. Texas Instruments is using the Cortex M4 in it's TivaC series of microcontrollers they've developed.

    Although you can't run a proper OS on these chips, there are alternatives realtime operatirn gsystems like FreeRTOS, NilRTOS, and ChibiOS/RT to give you real time multithreaded capability in a tiny memory foot print.

    There are versions of the Wiring/Processing IDE available for the Arduino (Arduino) and TI platforms (Energia) that hide allot of the ugliness for beginning programmers. However, folks can use higher end tools like Texas Instrument's Code Composer or Atmel Studio. Additionally, there are CodeBlocks, Eclipse, XCode, Visual Studio (in fact Atmel Studio is based off of VS) plugins so that you can use your favorite IDE.

    Furthermore, folks are developing very nice C++ event driven OO frameworks (like Cosa) that extend the use/flexibility of these minimalist platforms and overcome the shortcomings of existing libraries/infrastructure. (note: the Cosa example I listed is AVR only at the moment, but hopefully he will add classes and device drivers for some of the ARM stuff)

    Even without the OS, we can do things like use GSM modems, LCD screens in graphics modes, GPS, WiFi, serve up web pages, post to twitter (seriously) all from tiny devices using these basic ARM processors.

    (note: I'm a big user of Arduino DUE (cortex m3), but have been seduced by TIs TivaC because of built in Ethernet and EEProm... yes the Cortex m3 has built-in Ethernet, but the Italians (Arduino folks) didn't implement it on the DUE.

    There are some microcontroller type development board that use A series ARM processors... Both the Arduino TRE and BeagleBone Black use the TI Sitara series microprocessor/microcontroller.... When you step up to this class... you gain Linux.
    There's also a hybrid board offered by Arduino called the YUN, but it's an oddball that uses an AVR processor for I/O, but takes advantage of excess processing capability of the included WiFi adapter to run a scaled down Linux...

    Anyway, this stuff is fun to get into.... Have a look at Texas Instruments here: http://www.ti.com/lsds/ti/microcontrollers_16-bit_...
    or Arudino here: http://www.arduino.com
  • bminor13 - Monday, August 25, 2014 - link

    Somewhat off-topic question - do you know of a set of free tools to program an Arduino Due that is more advanced than the Arduino IDE? I'm looking for a toolchain that allows for more complex applications so that I can explore the capabilities of the Cortex M3.
  • extide - Monday, August 18, 2014 - link

    Great article! I love these little Cortex M cpu's. They are quite a bit faster than your typical ATMega or something, which can be kinda handy for some things. I am working on a few different designs that will use Cortex M3's as the MCU!
  • PICman - Monday, August 18, 2014 - link

    It's amazing to see a 32 bit processor in a 1.6x2.0 mm package with 20 'pins', both from a technology and a marketing standpoint. For typical microcontroller applications it will compete with some brutal competition where price is key. I wonder how many they will sell? A better question is 'how many engineers are willing to pay $0.10 more to go from 8 or 16 to 32 bits?' I don't mean this in a skeptical way, more just curiosity.
  • ET - Tuesday, August 19, 2014 - link

    "32-bit microcontrollers accounted for over 34% of the market in 2013; the segment is further expected to dominate global market revenue over the next six years." (from https://www.linkedin.com/today/post/article/201406...
  • tuxfool - Tuesday, August 19, 2014 - link

    Arm can exploit the fact that the instruction set scales all the way from ultra low power m0 all the way up to very capable embedded controllers m4. Add to the fact that they can exploit economies of scale in that the core is used across socs from various vendors just like the A series. The pricing on some of these socs is very agressive.
  • Slaanesh - Tuesday, August 19, 2014 - link

    I wonder how these extreme low power chips compare to ancient x86 processors like the 486DX performance wise... Would they be somewhere in the same league?
  • Kvaern - Wednesday, August 20, 2014 - link

    http://www.netlib.org/performance/html/dhrystone.d...

    Enjoy.

Log in

Don't have an account? Sign up now