Even though HBM (High Bandwidth Memory) standard only launched last June in the form of AMD’s Fiji GPU, that memory was considered a ‘trial run’ for HBM2 – a memory standard which is here to stay. Launching in mid-2016 with AMD Polaris and NVIDIA Pascal, HBM2 memory standard will redefine computing as we know it. There are several memory standards which want to replace DDR and GDDR memory standards, including Intel-Micron 3D XPoint (pronounced: Cross Point) Optane memory – but HBM looks to have the widest support.
If we compare this to HBM2, it had 1GB capacity and offered 0.5 Gbps bandwidth in 4-Cube configuration for a total of four gigabytes of memory. HBM gives high-performance silicon vendors ability to scale from one to eight cubes, going from 2 GB to as much as 64 GB of memory. Bandwidth is nothing to be sneezed at – from 128 GB/s to 2 TB/s.
Given that the slides come fron an industry panel which was held last year, don’t expect NVIDIA Pascal or AMD Polaris with anything more than four Cube configuration with 4 GB, for a total of 16 GB HBM memory at 1GHz clock, achieving 1 TB/s. AMD FirePro and Tesla chips for servers (HPC) will probably launch with four cube configuration (8GB capacity) for 32 GB capacity. Seeing graphics accelerators with 32 GB memory is something both AMD and NVIDIA need in order to fight off the advance of Intel’s new Xeon Phi (codenamed Knight’s Landing), which will offer 16 GB of embedded memory at 400GB/s and up to 384 GB of external DDR4-2133 memory.
Even though the focus of all the major tech media will be on HBM in graphics (AMD, NVIDIA, Intel), there is another player that might lead the adoption of HBM2 in HPC and Server space – CISCO. Cisco is now designing next-generation of networking equipment which will migrate from GDDR5 memory (probably the only non-graphics application of the said memory) to HBM2, and is targeting 1.2 kW 1U networking switches (yes, 50kW heat dissipation for cooling the networking rack alone) supporting 10, 20, 40 and 100 Gbps Ethernet standards. Given that Cisco needs to drive 10 GB/s per single Ethernet port, a 48-port switch has to deliver 480 GB/s. Thus, don’t be surprised to see a custom FPGA or ASIC driving eight HBM2 cubes at 2.0 Gbps.
Still, that doesn’t mean AMD, Intel or NVIDIA will sleep on their laurels with a four Cube memory implementation. The age of 8196-bit memory bus is just a generation or two away, and as the cost of HBM2 memory comes down, GDDR5X and perhaps GDDR6 will go in the history as the last SGRAM modules, replaced by single or dual cube configuration on the entry-level hardware.
Same qustion will come on the subject of DDR4/DDR5 system memory. 64 GB of system memory for a server-grade CPU might sound ‘2015 or 2016 standard’ in enterprise, and even underpowered in HPC space, but some of our clients would optimize the HPC code for 64 GB instead of 128 and even 192, if that meant the bandwidth would go from 0.068 TB/s (quad-channel DDR4-2133) to 2 TB/s. Will HBM go as low as mobile processors with 4 or 8 GB HBM in cars and airplanes?