Apple, Business, Companies, Graphics, Hardware

Silicon Technology Outlook Analysis for 2014

In this feature article, we present the roadmaps for the ongoing year of a few major semiconductor companies that are important for the server, PC, tablet and to some extent mobile markets. The companies presented here are sorted alphabetically. Especially for the mobile markets we could have included a number of additional vendors but we opted not to due to scarcity of information about upcoming products. 

AMD

The year already began with a few product introductions by AMD. The company already introduced it’s Kaveri APU at the beginning of the year, which we covered in ample detail already. Kaveri is the first chip of AMD to feature Steamroller, the third generation of the Bulldozer architecture. Steamroller is meant to improve IPC considerably by fixing some of the bottlenecks present in the design.

While Kaveri is the first HSA-enabled product from AMD, HSA features need software support that is expected to come only later in 2014. One of these features is heterogenous queuing (hQ) which AMD detailed a few months ago. For now Kaveri will have to stand its ground in classical CPU and GPU workloads, while the HSA features could work like an afterburner that is ignited a little bit later. One issue that is interesting to look at is how AMD solves the growing demand for memory bandwidth with its ever growing integrated GPU cores. For now it seems AMD endorses the use of high performance DDR3-2400 memory which provides 50% more bandwidth over standard DDR3-1600 modules. While JEDEC only specified DDR3 up to 1866MHz, there is ample supply of 2133MHz and 2400MHZ memory, albeit at a price premium that works against AMD.

Small-core APUs will get an update called Beema and Mullins, which adds slight improvements to the CPU cores which are called Puma (instead of Jaguar, the predecessor). Most disappointingly those will not get any HSA support. Still they are small, important updates to deliver crucial power efficiency improvements for tablets and small laptops. Allegedly the improvements focus on power efficiency to better compete against Intels Baytrail Atom SoCs.

At this point we know nothing about a successor for AMDs FX line of CPUs that is very popular with gamers who rely on discrete graphics. The current Vishera core was launched over a year ago and according to AMDs public roadmap will stay the only offering in this space for 2014. This sparked rumors that AMD plans to phase out this type of product sooner or later, which AMD officially denied. Given our appreciation for competition we would love AMD to refresh their big CPU cores with a Steamroller-based product to deliver performance improvements, but as of yet we have not even the slightest hint that this might come true anytime soon.

On the server side of things, there is both good and so-so news. The big iron Opterons just got a refreshment which basically extended the Opteron 6000 series towards the bottom. The letdown is that these are still based on the Piledriver core. Recently roadmaps surfaced that show a similar direction as was publicized on the desktop. There is an absence of new big CPUs. AMD also tries to establish APUs as a server product, but so far they had limited success. One of the reasons for that may be that APUs currently can not be used to accelerate many traditional workloads and thus have to target certain niches which aren’t as big of a market.

On the bright side, AMD will enter the marked for ARM 64bit server SoCs in the second half of 2014. After Calxeda unfortunately went bankrupt, this could be a very promising market entry, provided AMD gets everything right. The ARM ecosystem is still very competitive and it remains to be seen how AMD can prevail against a plethora of vendors who also want a piece of the pie. Recently AMD announced the first ARM server chip and laid out very aggressive business goals. Basically the company aims to lead the market in ARM-based server chips. It was also confirmed that the ARM64 SoCs already support DDR4 memory. Given that the first DDR4-capable APU from AMD is slated to come out in 2015, this will be the first product incorporating the new memory technology from AMD.

AMD disclosed very limited performance data, that allows to make some comparisons with other CPUs. AMD disclosed a SPECint2006_rate of 80 for the Opteron A1100 that has a 25W TDP and is clocked at 2GHz or higher. AMD gave a figure of 28.1 for the same benchmark of the Opteron X2150, which is clocked at 1.9GHz and comes with a 22W TDP, but also integrates a GPU. For certain workloads where the GPU isn’t needed this is a huge boost in performance per watt and also compute density. For comparison Intel gives an estimated SPECint2006_rate of 105 for their Atom C2750 (8 core Avoton, 2.4GHz, 20W TDP). If we assume the ARM-based Opteron to be clocked at a straight 2GHz this puts it very close to Intels small core server CPU.

The SPECint2006 numbers also show why AMD is less enthused about their big Opteron cores. The 8-core ARM SoC comes dangerously close to a single socket Opteron 3300 series, at least in this integer-centric benchmark. A 65W Opteron 3380 comes in at a score of 132 as per AMDs own numbers. In the same TDP headroom you can use two A1100-based servers to beat the big CPU core. Of course this advantage can only be leveraged if the workload can be parallelized well, which is typically true for typical server workloads. The only downside is that floating point performance of the small cores and also the ARM core is substantially lower, so it depends on the target application too. Finally we’d like to note that some of the vendor-provided SPECint numbers are based on simulations and thus real world performance may deviate a bit.

On the graphics side of things much less is known about AMDs plans for the next year. After the launch of Hawaii in September it is pretty safe to say that this will be the last major product we will see based off 28nm GPUs. The process is very mature by now and it is both thermally and economically prohibitive to make an even bigger chip at this node. The big question is when 20nm manufacturing is ready for prime time for high performance GPUs and what benefits beyond the physical size reduction it brings.

We can see AMD launching a refreshed product based off a shrinked midrange GPU as a first 20nm product, which is an approach they also took in past generations. Of course AMDs strategy here also depends on what NVIDIA will do, which we will discuss later in this article. If there is strong competitive pressure and 20nm not yet ready, AMD will probably try to optimize the chips they have for even more performance. Given they already did that to some degree with the 280X, 270X, 270 and 260X rebranded graphics cards, the potential here is a bit limited though. For now AMD seems to be busy to fully deliver all the features of their current generation, most notably Mantle, which recently had a rather bumpy introduction that still needs some polish on the driver side.

Imagination Technologies

When it comes to Imagination Technologies, their primary focuses are on their GPU IP, which include their PowerVR architecture commonly licensed by companies like Apple and many other ARM (and x86) SoC vendors. They also bought MIPS last year, which increases their focus on CPU architecture and trying to license their IP into as many devices as they could possibly imagine. They are a lot like ARM in that sense, that with the exception of their Caustic Graphics ray-tracing GPUs that they sell directly or through OEMs. However, it appears to be taking more time than they had anticipated to gain momentum. They may eventually just end up licensing the IP into vendor silicon like they do PowerVR and MIPS.

In the world of MIPS, Imagination is inheriting a business that they don?t have much experience in as they?ve traditionall been a graphics company. However, IMG wants to help bring together their GPUs with their CPUs and to bundle the IP much like ARM does and to implement HSA like AMD, ARM and MediaTek are looking to do.

Since IMG isn?t a company that really produces many if any products themselves, we won?t speak too heavily about them as we will the other companies listed in this article. In 2014, we can expect to see more of MIPS? multi-core P5600, the first MIPS Series5 ?Warrior P-class? CPU as they?ve already gained their first licensee late last year. This Series5 Warrior P-class of CPUs is based on a 32-bit architecture, which clearly indicates MIPS? goal is to be widely accessible and more easily programmable. In addition to this, they claim low power features in a silicon footprint up to 30% smaller than comparable CPU cores

In terms of their GPU IP for PowerVR, we can expect to see the PowerVR 6XT series probably landing within devices in 2015, since PowerVR?s new architectures usually don?t get implemented immediately after they are announced. There is a good chance we could see this architecture in the next iPhone, but I?m not entirely convinced that could happen just yet. But, the fact that it delivers 50% better performance AND better power management could push Apple to work to get it integrated inside of their iPhones and iPads. The support for 4K resolution will also be a big deal, especially as the resolutions of tablets is steadily increasing and the GPUs continue to consume more and more power, especially on the iPad.

2014 will be an interesting year for Imagination because of their new company structure and how they will continue to gain licensees on MIPS in addition to PowerVR and if they can find ways to hedge the inevitable departure of Apple from the PowerVR architecture. While this is likely years away, if Apple wants it to happen, it will happen. And the truth is that they already employ multiple graphics architectures and knowing Apple, they will eventually want to develop their own, much like they do with their CPUs. I?m sure that this is already in development, but it will take them a good amount of time until they are ready and capable of replacing PowerVR?s IP.

Intel

In the mainstream desktop segment 2014 is alleged to bring refreshed Haswell CPUs, which are reported to bring very modest clock speed bumps to the table. We are talking about an additional 100MHz at the same positioning/price point. Technically the chips will be identical to what was launched as the 4th generation Core i-series CPUs. Along with the CPUs which are scheduled to get launched around Computex, come new chipsets ? the 9 series with the usual SKUs like H97 and Z97. The chipsets won’t bring many innovations with them though.

Meanwhile laptops will get the 14nm Broadwell core, which should be launched around Computex. Due to the delay with 14nm that we kind of predicted, yet Intel claimed as late as in September at IDF that it is on schedule, those chips will only be in products in the second half of the year in any meaningful quantities. Broadwell is a tick in Intels tick-tock model, which equates to a die shrink with only very minor changes to the logic itself. The GPU is said to receive 20% more execution units, which means 24 for the GT2 version and 48 for GT3.

The 14nm manufacturing process reduces the die size considerably and makes it more economic for Intel to create these chips. Usually improvements in manufacturing technology also bring power improvements. However, Intel has been very scarce with information on their 14nm process technology. Given the difficulties with silicon manufacturing at these sizes, we will have to wait until final chips can be independently tested to make a statement about the characteristics of Intels 14nm technology.

In the fourth quarter enthusiasts can have their fingers crossed for a special Broadwell-K codenamed product. This is essentially one or two special SKUs for the desktop market with the K-moniker, meaning unlocked CPUs. Since the Haswell refresh will not target the current 4670K and 4770K products, this will be very appealing for overclockers. The Broadwell-K CPUs are said to come with the 128MB on-package eDRAM codenamed Crystalwell as well as the full GT3 GPU. This not only makes the integrated GPU blisteringly fast, it also gives an uplift to CPU-bound software as the eDRAM works like a L4 cache. This product is basically what enthusiasts would already have wanted with Haswell, but due to various reasons Intel limited Crystalwell to certain products.

Before the enthusiast love on the mainstream platform, those seeking the highest levels of performance will get the Haswell-E treatment on a revised socket 2011 platform (LGA 2011-3). The new socket won’t be compatible with the previous one that could take both Sandy Bridge-E and Ivy Bridge-E processors. One of the reasons for this is the support for DDR4 memory (allegedly a mediocre 2133 MT/s at launch), a first for Intel and the industry. With this platform Intel is going to bump the core count to eight for products aimed at the desktop segment. Along with it comes the X99 chipset which will have a similar feature set as X79.

On the server side of things, Haswell-EP is going to beef up things towards the end of 2014. Given how product cycles work in this segment, the actual impact will only be made in 2015 realistically. Haswell-EP will feature up to 15 cores and up to 35MB of L3 cache. Haswell-EP will be branded as Xeon E5 v3. There will also be Haswell-EX with up to 20 cores and 40MB L3 cache, but not much is known and it could possibly only come in 2015.

Earlier in 2014 the highest end server line Xeon E7 is scheduled to get an update based on Ivy Bridge-EX. Previously Xeon E7 was based on the Westmere-EX core launched in 2011. The new Ivy Bridge based Xeons will have up to 15 cores, 37.5MB of shared L3 cache and TDPs ranging from 105W to 155W. Xeon E7 provides additional reliability, availability and serviceability (RAS) features over the E5 line as well as scalability up to eight sockets. Alongside with these Xeon E7 processors, we also expect Intel to launch the 4-way variants of the Ivy Bridge-EP based Xeon E5 v2. Last year Intel launched only the uni- and dual-processor variants. The number of supported sockets is denoted by the first digit of the processor number.

Intels small processor cores will also get an update at some point of the year. As a followup for the Silvermont architecture found in Baytrail products for tablets and notebooks and the Avoton server product, the 14nm shrink of the architecture
is called Airmont. While we don’t know many specifics, we can expect a launch in the Q3-Q4 timeframe. We also don’t know if it will be a plain shrink (i.e. a tick) or if there will be new features also. The only thing Intel disclosed is that the GPU will get an update. If we read the codenames right, the first incarnation will be Cherry Trail, the successor to Bay Trail and thus targetting tablets.

One possibility of a new feature of Airmont might be the re-introduction of Hyperthreading. Around IDF Intel was very guarded about Hyper-Threading on this architecture, but Francois Pidnoel explained to us that Hyper-threading was never introduced in the first generation of a new architecture without commenting further. Also we do know that the 14nm successor for Xeon Phi dubbed Knights Landing will be based on an augmented Silvermont architecture and feature 4-way Hyper-threading (4 threads per core). So the possibility certainly exists, Knight’s Landing is scheduled for 2015, so it could take a bit longer too. One thing we are certain though, if it resurfaces, most products will only implement the 2-way variant of it that is available on a range of products.

Before that happens though, Intel will refresh their smartphone lineup with Merrifield. It is very plausible that Intel will announce it around the Mobile World Congress taking place from 24th to 27th of February in Barcelona, Spain. Merrifield is projected to bring 70% improved CPU and double the GPU performance over Intels current Medfield smartphone platform. Merrifield will start out in dual-core variants with quad-core models codenamed Moorefield following in the 2nd half of the year which are supposed to be a roughly 2X performance improvement.

For the first half of 2014 Intel also announced the XMM 7260 LTE Advanced modem, which supports carrier aggregation, faster network speeds (up to 300Mbps, CAT6), support for both TD-LTE and TD-SCDMA and operating in a large number of bands (17 LTE FDD bands, 5 TDD bands). It is the followup to the XMM 7160 launched last year. Late in the year Intel will also introduce their first integrated phone SoC codenamed SoFIA with a 3G baseband included. It is aimed at budget markets and oddly enough manufactured at TSMC for cost reasons. We previously reported about Intels mobile plans as announced on their Investor Meeting in November 2013.

MediaTek

When it comes to MediaTek, there is a lot that is known about their plans for early 2014 and what we can expect from the company in terms of SoC and mobile modems but little is known about the latter part of the year. In late 2013, they announced their MT6592 ?Octa-Core? mobile SoC which is actually an eight core SoC with eight ARM Cortex-A7 CPU cores designed for low power performance. It is capable of decoding 4K video content as well as supporting a 16 megapixel camera and full HD display. It also has full connectivity capability for dual-band 802.11n WiFi, Miracast, Bluetooth, GPS and an FM tuner. Their GPU of choice is the ARM Mali-450 series GPU, which should get them a pretty satisfactory level of performance. Additionally, MediaTek claims a certain level of Heterogeneous Multi-Processing (within the CPU) which they call HMP. They also claim that they are using a Heterogeneous Computing architecture, utilizing a whole host of components of the SoC to properly balance the workloads. However, even though MediaTek is part of the HSA Foundation, they currently have no products announced or planned officially to enter the HSA spec.

On the modem front, MediaTek appears to be going after Qualcomm with their multimode LTE modem chipset. This chipset is by no means the latest or greatest, but it supports technologies currently available and will likely do them at a lower cost than Qualcomm. Their MT6290 supports LTE Cat4 (release 9), DC-HSPA+, WCDMA, TD-SCDMA, EDGE and GSM/GPRS. It will be manufactured on a 28nm process, although they never disclosed who the fab partner would be, even though TSMC is likely assumed. If not, then it would likely be Globalfoundries or UMC as there aren?t many fab options out there.

They also announced plans for CDMA2000 and ?Worldmode? chipsets which would emcompass CDMA2000 and LTE technologies across the same mobile chipset. MediaTek is collaborating with VIA telecom in order to enable MediaTek?s SoCs to support global roaming of voice and data services between the US and China. The global roaming capabilities would cover CDMA2000, LTE (both FDD and TDD), DC-HSPA+, UMTS, TD-SCDMA and GSM/EDGE. These new global chipsets will also feature software upgradable VoLTE capability once carriers start to implement VoLTE across their networks. Samples of these SoCs will be available in the fourth quarter of 2014, but won?t ship in products until early 2015.

Such a chipset would allow for Mediatek?s SoCs to compete with Qualcomm?s in the Chinese market and possibly even in the US market if any company utilizes their SoC and decides to enter the US market with a ?global? device, a good example could be someone like Xiaomi.

In addition to SoCs and wireless modems, they also announced what they claim to be the world?s first multimode inductive and resonance wireless charging solution. This solution aims to gain the benefits of each charging solution based on the best use case at that time. Few details were given about when they expect to implement their hybrid solution in final products, but this would be a challenge to Qualcomm?s own wireless charging efforts. By the looks of it, though, this appears to be a very long term look for MediaTek and they want to make sure that resonance gets into the standards that currently exist in order to gain more traction. MediaTek actually has a very good and detailed document that explains the problems in wireless charging and what they seek to accomplish with their solution.

NVIDIA

We do know that for 2014 NVIDIA has the Maxwell GPU on their roadmap. Unfortunately we don’t know much beyond that. However we have a few data points that we like to present in this matter. First of all, a very likely introduction date of the technology is the GPU Technology Conference (GTC) taking place from March 24th to 27th in San Jose, California. There have been other rumors about announcements in February or even at CES, but as far as we are concerned these are just rumors.

Technology wise Maxwell is positioned to have higher performance per watt as the roadmap slide from GTC 2013 shows. This doesn’t come as a surprise as it is pretty much the only way to improve GPU performance over the previous generation. As we already discussed in the AMD section, GPU makers nowadays are mostly constrained by power and die space. This is basically the wall NVIDIA ran into with GK110, which consists of 7.1 billion transistors occupying 551mm². It is safe to say that this is the most compl
ex GPU built to date. AMDs Hawaii is a bit smaller but is thermally more constrained. In general both companies have extracted the most they could out of 28nm technology with the respective architecture they have at their disposal.

Now the logical step would be to look for the next technology step, which would be TSMCs 20nm node. While we don’t want to rule out GPU manufacturing at other foundries, currently both major GPU makers fared best with TSMC’s manufacturing so far. For sure 20nm technology will bring a size reduction, that allows to put more of those shader cores and other units into the GPU and thus improve performance. The unknown as of yet is whether TSMC’s 20nm technology is also able to reduce power consumption. Some rumors allege this is not the case, but until we have hard evidence about this, we don’t want to draw a gloomy picture here.

Now back to Maxwell and how all this relates. If Maxwell is still made using 28nm technology, we can pretty much expect compute unit count to remain more or less flat. We can also expect clock rates to remain in a similar ballpark for the respective product positioning. Thus any performance improvements must come from an improved architecture. Now we do know that Kepler is already fairly efficient. Even if there is some performance left on the table to be reaped by optimization, I’d doubt it is all that much.

On the other hand if Maxwell will be made using 20nm, which it should be, the performance delta the product can achieve depends mostly on the quality of the manufacturing process. If there are indeed issues with power draw at this process, then 20nm will make it more economic for NVIDIA to make these GPUs as they will shrink in size, but they won’t be much faster. Yield is another issue to be wary of, if yield is low, availability is not where it helps either the company or customers.

It could be that NVIDIA takes a safe approach and will first introduce entry to midrange GPUs on 20nm, where the impact if there are issues is less pronounced. Then the company could learn from that and gradually move to beefier GPUs, supported by the continuous improvements that are usually made in silicon manufacturing as the technology is ramped.

In the last few weeks there have been a few rumors roaming the net about a GeForce GTX 750 Ti based on a rumored GM107 chip, which would be a Maxwell generation chip. At this point it is hard to make a final judgement about these rumors. While performance-wise the name would possibly make sense, putting a Maxwell chip in the same series as the usual Kepler GPUs is a bit confusing. So instead of adding further oil to this fire, we will simply wait for a more credible announcement on behalf of NVIDIA.

Tegra

In terms of Tegra, we already know what to expect for 2014, thanks to Nvidia?s press conference at CES. They?ve already talked about their goal to have a stronger role in the car with their current design wins as well as their Tegra K1 SoC. The Tegra K1 is a culmination of Nvidia?s Kepler GPU architecture and their already existent and two different CPU architectures. So, essentially the Tegra K1 is actually two entirely different SoCs both with Nvidia?s Kepler desktop GPU architecture. The advantage of using this GPU architecture for Nvidia is that it enables them to finally support CUDA as well as Open GL ES 3.0 and Open GL 4.4 meaning that it is no longer constrained by mobile graphics APIs. Furthermore, it also supports Direct X 11.1 which should make integration into Windows RT tablets quickly and easily.

What is interesting about the Tegra K1 is that we can expect to see mobile devices to utilize the 5-core ARM (quad A15 cores + one A9 core) CPU in products in the second half of the year and we?ll likely see some form of design win at MWC. Nvidia has already tried to win the war of perception, though, by integrating a Tegra K1 into a Lenovo all-in-one desktop and had some websites benchmark it. To no one?s surprise, it vastly outperformed all other mobile SoCs because it was plugged into a wall socket, running in a desktop, and using active cooling. So, any performance metrics that we see about Tegra K1 should be reserved until it can be tested in a tablet or smartphone. Ultimately, it should be evaluated once integrated into a final device, like a readily available tablet or smartphone.

The other interesting thing about Tegra K1 is the fact that the second flavor of Tegra K1 will feature two of Nvidia?s 64-bit Project Denver ARM cores. These cores are ARM v8 compatible and are designed to be ?super cores? according to Nvidia?s own marketing materials. These cores are also going to be the bread and butter of any Nvidia efforts in the server market. While we don?t know many details about the ARM v8 Project Denver cores, we do know that it will be 7-way superscalar, clock up to 2.5 GHz and have 128K+64K L1 cache.

Both chips will be manufactured using TSMC?s 28nm HPM process, which should mean that this chip will have to be pretty power efficient if they want to be competitive at the same node as their competitors. More importantly, Nvidia will have to find a way to get design wins in a landscape where Mediatek, Intel and others are fighting to compete against Qualcomm in the mobile space.

Even more importantly, they haven?t shown anything about their modem designs with new SoCs and if they want to win any smartphone deals nowadays they?re going to have to have a modem too. Considering that both Qualcomm and Intel both have modems, and Qualcomm?s is already integrated into most of its high-end SoCs, this is an uphill battle for Nvidia. Intel has already shown their roadmap for getting their modems integrated into their SoCs, but Nvidia has still failed to show anything other than T4i, which is actually a Tegra 3 SoC with an Icera i500 modem.

As far as we know, there have been zero T4i design wins and the only device that exists using T4i is nvidia?s own Phoenix reference platform. The real truth is that we?re already in 2014 and T4i is essentially a 2011 chip with a die shrink and a possibly working Cat 3/Cat 4 LTE modem.

Qualcomm

When it comes to Qualcomm, we already have a fairly good idea of what to expect in the first half of 2014, while the latter half seems a bit murky.

The first chip that we know of is their Snapdragon 805, which was actually announced last year, but will be coming to market in the first half of the year. We can expect many design wins to be announced at MWC 2014 in Barcelona, but we already know that the ASUS Padfone X will likely be one of the first devices with it. The Snapdragon 805 is Qualcomm?s ?Ultra HD Mobile Processor? which they claim to be capable of a whole host of 4K check boxes. They claim it will be capable of Ultra HD gaming with the Adreno 420 GPU, which is sort-of a ridiculous claim unless they?re talking about 2D gaming. Because there?s no real way that they could enable 4K gaming in a tablet with the amount of GPU horsepower they have. It already takes a fairly hefty pair of $500 desktop GPUs to accomplish such a task, so any kind of claims of 4K 3D gaming are ludicrous at best. They also mention an H.265 hardware decoder done in the hardware decode block, r