Samsung HBM vs. HBM2 - 4Gbit on the left, 8Gbit on the right.
Samsung Semiconductor is on a roll of late. The company introduced FinFET transistors with the 14nm process last year for logic, beating Intel for the first time in history to to a new manufacturing node. 14nm process expanded from in-house Exynos SoC processors to customers such as Apple, Qualcomm and others, while Intel was trying to get Broadwell architecture out the door. This process was followed by the announcement of ultra-dense 15nm NAND Flash memory, and now the company announced mass-production of next-gen memory standard – HBM2.

Sasmung 4GB HBM2 DRAM overview
The company announced that it started mass production of High-Bandwidth Memory 2 chips using its 20nm process, which just got ratified by the memory standards body, JEDEC. You can download the whitepaper for 1797.99F i.e. JESD235A HBM memory standard here, as well as its extremely dense 4994-ball packaging layout. Samsung is manufacturing 4GB HBM2 chips, with mass production of 8GB HBM2 chips to follow during the second part of 2016. Both 4GB and 8GB chips come with the impressive bandwidth of 256GB/s-per-chip.
When AMD debuted its Fiji architecture in June 2015, that marked the debut of HBM (High Bandwidth Memory) memory standard which, as always – comes from an in-house AMD team that designs high-speed memory, and later gets ratified through JEDEC. SK.Hynix was the only HBM memory producer, manufacturing 1GB HBM1 chips for Radeon R9 Fury lineup. With HBM2, Fury would not come with 4GB, but with no less than 16GB memory. Once 8GB chips hit production, a single Fiji chip could be shipped with 32GB of memory.
As the bandwidth doubled between HBM1 and HBM2, AMD Greenland and NVIDIA Pascal will come to market using 16GB of memory (four chips) at 1TB/s, while SC 16 conference in Salt Lake City could see the launch or public demonstration of 32GB computational cards, again with 1TB/s bandwidth. What makes HBM2 memory fascinating is the fact that it offers higher memory bandwidth than AMD and Intel processors offer L2 and L3 cache (latency is another thing). For example, this is Intel Core i7-6700K (read/write):
- L1 Cache: 1.04 TB/s (r); 520 GB/s (w)
- L2 Cache: 460 GB/s (r); 310 GB/s (w)
- L3 Cache: 280 GB/s (r); 240 GB/s (w)
- 2-Channel DDR4-2133: 31.2 GB/s (r); 32.5 GB/s (w)
The arrival of higher capacity HBM2 memory should go very well with AMD, given that the company plans to integrate a single HBM2 chip in the future consumer and enterprise APUs. This bodes well especially for consumers, since the company traditionally does not use L3 cache in its APUs. For example, this is AMD’s A10-7870K ‘Godavari’ APU:
- L1 Cache: 240 GB/s (r); 115 GB/s (w)
- L2 Cache: 220 GB/s (r); 80 GB/s (w)
- 2-Channel DDR3-2133: 19.8 GB/s (r); 12.5 GB/s (w)
For AMD, HBM2 memory cannot come s00n en0ugh, as even a single HBM2 chip on current Steamroller architecture would be faster in write than AMD’s own L1 cache. For graphics and computational chips though, the battle between AMD Greenland-based FirePro, NVIDIA Pascal-based Tesla and Intel Knight’s Corner-based Xeon Phi will be a great one to watch.
- AMD FirePro Polaris – 16GB / 32GB HBM2; 1TB/s r/w
- NVIDIA Tesla P-Series (Pascal) – 16GB / 32GB HBM2; 1TB/s r/w
- Intel Xeon Phi – Knights Corner – 16GB; 400GB/s r/w