HP Moonshot

We discussed the HP Moonshot back in April 2013. The Moonshot is HP's answer to SeaMicro's SM15000: a large 4.3U chassis with no less than 45 cartridges that share three different fabrics: network, storage, and clustering. Each cartridge can contain one to four micro servers or "nodes". Just like a blade server, cooling (five fans), power (four PSUs), and uplinks are shared.

Back in April 2013, the only available cartridge was based on the anemic Atom S1260, a real shame for such an excellent chassis. Since Q4 2014, HP now offers six different cartridges ranging from the Opteron X2150 (m700) to the rather powerful Xeon E3-1284Lv3 (m710). The different models are all tailored to specific workloads. The m700 is meant to be used in a Citrix virtual desktop environment while the m710 is targeted at video transcoding. We tested the m400 (X-Gene 2.4), m300 (Atom C2750), and m350 (four Atom C2730 nodes) cartridges.

The m400 is the first server we have seen that uses the 64-bit ARMv8 AppliedMicro X-Gene. HP positions the m400 as the heir of mobile computing, and touts its energy efficiency. Other differentiators are memory bandwidth and capacity. The X-Gene has a quad-channel memory controller and as a result is the only cartridge with eight DIMMs. We were very interested in understanding how X-Gene would compare to the Intel Xeons. HP positions the m400 as the micro server for web caching (memcached) and web applications (LAMP). The m400 also comes with beefy storage: you can order a 480GB SSD with a SATA or M.2 interface.

The m300 cartridge is based on the Atom C2750 with support for up to 32GB of RAM. HP positions this cartridge as "web infrastructure in a box". The m400 is mostly about web caching and the web front-end while the m300 seems destined to run the complete stack (front- and back-end). However, it is clear that there is some overlap between the m300 and m400 as there's nothing to stop you from running a complete "web infrastructure" on the m400 if it runs well in 32GB or less.

The m350 cartridge is all about density: you get four nodes in one cartridge. There is a trade-off however: you are limited to 16GB of RAM and can only use M.2 flash storage, limited to 64GB.

Each node of the m350 is powered by one of Intel's most interesting SKUs, the 1.7GHz 8-core Atom C2730 that has a very low 12W TDP. The m350 is positioned as a way to offer managed hosting on physical (as opposed to virtualized) servers in a cost effective way.

The Micro Server and Low-End Server World Explored Simple and Affordable: the Supermicro MicroCloud
Comments Locked

47 Comments

View All Comments

  • Wilco1 - Tuesday, March 10, 2015 - link

    GCC4.9 doesn't contain all the work in GCC5.0 (close to final release, but you can build trunk). As you hinted in the article, it is early days for AArch64 support, so there is a huge difference between a 4.9 and 5.0 compiler, so 5.0 is what you'd use for benchmarking.
  • JohanAnandtech - Tuesday, March 10, 2015 - link

    You must realize that the situation in the ARM ecosystem is not as mature as on x86. the X-Gene runs on a specially patched kernel that has some decent support for ACPI, PCIe etc. If you do not use this kernel, you'll get in all kinds of hardware trouble. And afaik, gcc needs a certain version of the kernel.
  • Wilco1 - Tuesday, March 10, 2015 - link

    No you can use any newer GCC and GLIBC with an older kernel - that's the whole point of compatibility.

    Btw your results look wrong - X-Gene 1 scores much lower than Cortex-A15 on the single threaded LZMA tests (compare with results on http://www.7-cpu.com/). I'm wondering whether this is just due to using the wrong compiler/options, or running well below 2.4GHz somehow.
  • JohanAnandtech - Tuesday, March 10, 2015 - link

    Hmm. the A57 scores 1500 at 1.9 GHz on compression. The X-Gene scores 1580 with Gcc 4.8 and 1670 with gcc 4.9. Our scores are on the low side, but it is not like they are impossibly low.

    Ubuntu 14.04, 3.13 kernel and gcc 4.8.2 was and is the standard environment that people will get on the the m400. You can tweak a lot, but that is not what most professionals will do. Then we can also have to start testing with icc on Intel. I am not convinced that the overall picture will change that much with lots of tweaking
  • Wilco1 - Tuesday, March 10, 2015 - link

    Yes, and I'd expect the 7420 will do a lot better than the 5433. But the real surprise to me is that X-Gene 1 doesn't even beat the A15 in Tegra K1 despite being wider, newer and running at a higher frequency - that's why the results look too low.

    I wouldn't call upgrading to the latest compiler tweaking - for AArch64 that is kind of essential given it is early days and the rate of development is extremely high. If you tested 32-bit mode then I'd agree GCC 4.8 or 4.9 are fine.
  • CajunArson - Tuesday, March 10, 2015 - link

    This is all part of the problem: Requiring people to use cutting edge software with custom recompilation just to beat a freakin' Atom much less a real CPU?

    You do realize that we could play the same game with all the Intel parts. Believe me, the people who constantly whine that Haswell isn't any faster than Sandy Bridge have never properly recompiled computationally intensive code to take advantage of AVX2 and FMA.

    The fact that all those Intel servers were running software that was only compiled for a generic X86-64 target without requiring any special tweaking or exotic hacking is just another major advantage for Intel, not some "cheat".
  • Klimax - Tuesday, March 10, 2015 - link

    And if we are going for cutting edge compiler, then why not ICC with Intel's nice libraries... (pretty sure even ancient atom would suddenly look not that bad)
  • Wilco1 - Tuesday, March 10, 2015 - link

    To make a fair comparison you'd either need to use the exact same compiler and options or go all out and allow people to write hand optimized assembler for the kernels.
  • 68k - Saturday, March 14, 2015 - link

    You can't seriously claim that recompiling an existing program with a different (well known and mature) compiler is equal to hand optimize things in assembler. Hint, one of the options is ridiculous expensive, one is trivial.
  • aryonoco - Monday, March 9, 2015 - link

    Thank you Johan. Very very informative article. This is one of the least reported areas of IT in general, and one that I think is poised for significant uptake in the next 5 years or so.

    Very much appreciate your efforts into putting this together.

Log in

Don't have an account? Sign up now