Knight's Landing
Today, Intel made two fairly large announcements simultaneously with the announcement of their Omni Scale Fabric and the integration of it into their next generation of Xeon Phi chips. Additionally, Intel has worked with Micron to enhance Knight’s Landing with high-performance on-package memory. This on-package memory is also known as Hybrid Memory Cube or HMC.
The Knight’s Landing next generation Xeon Phi product announced today will use Intel’s Silvermont CPU architecture which is modified (or as Intel says, enhanced) for HPC. The expectation that Intel is setting is that these cores will deliver three times the single threaded performance of the previous generation and still be Intel Xeon Binary compatible.
As for the on-package memory itself, the HMC, it will support up to 16GB at launch while only taking up 1/3 the space (compared to Knight’s Landing vs Knight’s Corner) compared to GDDR5. They are also claiming five times the bandwidth compared to DDR4 using the same amount of memory. However, to be fair, this 5x bandwidth comparison versus DDR4 isn’t necessarily a fair one since DDR4 is still in its infancy and a more appropriate comparison would be DDR5 since DDR3 in servers is fairly slow. Intel and Micron are also claiming 5x the power efficiency when compared to GDDR5, based upon comparisons between Knight’s Landing and Knight’s Corner (the previous generation). But do keep in mind that Micron says they will only be shipping 2GB and 4GB parts this year, making the 16GB parts for Knight’s Landing 2015 parts.
The Omni Scale Fabric from Intel is designed to deliver maximum bandwidth and scalability between Intel’s future Xeon and Xeon Phi products. With interoperability between Knight’s Landing and Xeon processors coming with the 14nm generation of Xeon server processors. This Omni Scale Fabric will have PCIe adapaters, edge switches, director systems, Intel’s own silicon photonics and open software tools. All of these are designed to make the upgrade to Intel’s Omni Scale Fabric less painful.
In terms of performance, Knight’s Landing will deliver “3 TFLOPS of peak theoretical double-precision performance” which is based on preliminary and “expectaionts of cores, clock frequency and floating point operations per cycle.” Which really means that this hasn’t really been benchmarked exactly quite yet and we will have to see what the end product delivers when Intel actually starts shipping commercial systems in the second half of 2015.
There is also already an operational supercomputer using Knight’s Landing in their system and that’s NERSC’s Cori supercomputer which already employs Knight’s Landing and has already benchmarked the performance at over 3 TFLOPs per Knight’s Landing node.