When Intel announced that the company was working on Larrabee as their next-generation graphics part, mostly everybody thought that Intel would kill ATI/nVidia with ease. After all, the company knocked AMD from its feet with Core architecture, and Intel felt as secure as ever.
Over the course of the last couple of years, I have closely followed Larrabee with on and off-the-record discussions with a significant number of Intel employees. As time progressed, the warning lights stopped being blips in the distance and became big flashing lights right in front of our faces. After discussing what happened at the Intel Developer Forum and the Larrabee demo with Intel’s own engineers, industry analysts and the like, there was no point in going back.
This article is a summation of our information on Larrabee, hundreds of e-mails and chats, lot of roadmaps and covert discussions. When we asked Intel’s PR about Larrabee, his comment was that this story was "invented" and has nothing to do with truth. We were also told that our sources were "CRAP", which was duly forwarded to the sources themselves. We will cherish the comments that ensued afterwards for the remainder of our days, including a meeting that followed a comment "since [Intel] PR claims we don’t work on LRB, this is a blue cookie". Also, there were some questionable statements about our integrity, but here at Bright Side of News* we are going to continue doing what we did in the past – disclose the information regardless of how good or bad it is. We hope that it is good, but if it’s not – don’t expect us to stay put.
Unfortunately for the PR, marketing, and sales divisions, every company owes its existence to engineers who pour their hearts in projects and if wasn’t for that – you would not have chips with hundreds, and now billions of transistors in. Engineers don’t speak in Ronspeak language, but rather are quite open. This is what we would call, a real inconvenient truth.
For a company of Intel’s stature, we did not expect that a project such as Larrabee would develop in the way it has. In fact, according to information gathered over the years – LRB just doesn’t look like an Intel project at all [read: sloppy execution, wrong management decisions]. The amounts of leaks we received over the course of the past years simply surprised us; on several occasions, we had the opportunity of seeing internal roadmaps and hearing about frustrations regarding unrealistic expectations from the management. First and foremost, the release dates: Intel’s internal roadmaps originally cited "sampling in 2008", then "Release in Q4 2008", "Release in 2009". By summer 2008, we saw "mid-2009", "H2 2009," changing to "2010", "H2 2010" and after a recent conversation with several engineers that mentioned "we need another 13-18 months to get [Larrabee] out"; the time came [unfortunately] to complete this story.
The Road to Larrabee
In a lot of ways, Larrabee had a lot of issues even before Intel had committed to the project. Intel pondered about doing LRB for years, but the CPU-centric company was in love with the idea of having a Pentium 4 reaching 10 GHz [double pumped ALU would work at 20 GHz then] and be strong enough to power graphics as well. This was the same line of thought as Sony took with IBM Cell Processor – originally, PlayStation 3 was supposed to have two Cell CPUs at 4GHz each. After that idea went bust, nVidia got the hot potato to make a PS3 graphics chip with less than 12 months to launch. With Larrabee, we can’t exactly call Imagination Technologies being the savior.
Just like the Cell peaked at 3.2 GHz and had numerous issues [bear in mind that every shipped PS3 has eight SPE units, out of which only seven are used], Intel had an internal war about Tejas [successor to Pentium 4, next step of NetBurst architecture]. The fallout in the company was huge – even Intel’s die-hards realized that the NetBurst architecture had turned into a NetBust and gave open hands to the Israeli team who worked on the Pentium M processors [central part of Centrino platform]. During those wars, we heard a slogan "Megahertz is Megahurtzing us", especially in sales discussions versus a very hot topic at the time, AMD Opteron. There is also an internal story that touches on the power struggle that was happening at the time, but we will leave that one for another time. Let’s just say that there was a reason why a certain executive started pushing Larrabee like there’s no tomorrow.
After the CPU side of the company put its finishing touches on the Core architecture, Intel was sure that Core was the ticket to ride and started to turn its focus into becoming a real platform company by products as well, not just by PowerPoint slideware. Paying a significant fee to a licensing company [Imagination Technologies] so that you can use their power efficient, but "peformance shortfalling" technologies [PowerVR] in low-end netbook and notebook chipsets is somewhat ludicrous if you’re a company that designs ASICs [Application Specific Integrated Circuit] that get sold in hundreds of million units per year. Now, the performance shortfall is not exactly Imagination Technologies fault, as we all thought. The problem is that Intel hired a 3rd party vendor called Tungsten Graphics [now a whole owned subsidiary of VMware Inc.] to create the drivers for the parts. Problem with those drivers is the fact that "GMA500 suffers from utterly crappy drivers. Intel didn’t buy any drivers from Imagination Technologies for the SGX, but hired Tungsten Graphics to write the drivers for it. Despite the repeated protest from the side of Imagination Technologies to Intel, Tungsten drivers DO NOT use the onboard firmware of the chip, forcing the chip to resort to software vertex processing." There you have it folks, the reason why "Intel graphics sux" is not exactly hardware, but rather doubtful political decisions to have a VMware subsidiary writing drivers that are forcing CPU to do the work. We remember when Int
el used this as a demo of performance differences between Core 2 Duo and Quad, but the problem is – it is one thing to demonstrate, it is another to shove that to your respective buyers.
Secondly, Intel decided to use PowerVR SGX535 core, which can be found in Apple’s iPhone, Nokia N900, Sony Ericsson XPERIA and similar smartphones. If the company opted for SGX545, we would probably see a netbook and notebook parts [ones based on mobile 945 chipset] beating the living daylights out of Intel’s desktop chipsets that use Intel’s old GenX graphics hardware.
As a consequence of underpowered hardware and questionable driver decisions, Intel was always ridiculed by gaming development teams and cursed upon whenever a publisher would force that the game supports Intel’s integrated graphics – resulting in a paradox of having the best CPU and the worst GPU [several key developers warned us about never writing GMA graphics as a "GPU"]. In fact, during the preparation of this story a certain affair involving GMA graphics and driver optimizations in 3DMark Vantage broke out courtesy of Tech Report. Tim Sweeney of Epic Games lacked courtesy of Intel’s graphics capabilities commenting that "Intel’s integrated graphics just don’t work. I don’t think they will ever work." But that statement could be considered as courtesy compared to his latter statement "[Intel] always say ‘Oh, we know it has never worked before, but the next generation …’ It has always been the next generation. They go from one generation to the next one and to the next one. They’re not faster now than they have been at any time in the past."
Bear in mind that Intel also showed open doors to developers, as Tim Sweeney is now knee deep in creating Unreal Engine 4.0, one of first engines that will utilize Larrabee… when it gets out. Naturally, direct hardware access to hardware will apply to AMD and nVidia graphics too, with Fermi architecture now supporting native C++ code execution. But that’s another story…
The idea of Larrabee finally came to fruition towards the middle point of this decade. Intel started to hire people around the Globe with a lot of focus on sites in Oregon, California and Germany. In order to build a highly complex chip, you have to relay several teams working on various aspects and perhaps this might be the reason for current state of the Larrabee project and slippage in the roadmaps by some two years, with a potential for further slip. "CPUs are easier to make than GPUs"
Now, first and foremost, we have to disclose that it is excruciatingly hard to create a graphics processor. Even though some skeptics will say that "you just build one shader unit and then multiply it by an X factor", that is frankly, a load of bull. Today’s graphics processors are massively parallel beasts that require two factors to work: drivers and massively parallel hardware. This was confirmed to us by engineers at ATI, nVidia and Intel – so forget about picking sides here.
The DirectX 11 generation graphics parts from ATI and nVidia are featuring wide extensions to the chip microarchitecture itself, and saying that it is easy to create such a chip is, again – a load of bull. GPUs and CPUs operate on completely different sides of the computing scale. The CPU is optimized for random operations simply because it cannot expect what it will calculate, while the driver does most of the work for the GPU and just queues hundreds of thousands of instructions waiting to be churned out. Thus, CPU needs a shed load of cache; GPUs do not [unless you want to use them for computational purposes].
Intel LRB was designed to go head to head against these two, but the part… especially the right one.
As both things are moving in the same direction and becoming massively parallel beasts, the GPU is gaining cache and bandwidth speed. For example, ATI’s Radeon 5870 GPU comes with 160KB of L1 cache, 160KB of Scratch cache and 512KB of L2 cache. The L1 cache features a bandwidth of around 1TB/s, while L1 to L2 cache bandwidth is 435GB/s. This flat out destroys any CPU cache bandwidth figures, and we’re talking about a chip that works at "only" 850 MHz. Recently, SmoothCreations launched a factory over clocked card at 950 MHz for the GPU, pushing the bandwidth figures to over 1.1TB/s for L1 and almost 500GB/s for L1 to L2 cache speed. Bear in mind that this is a 40nm part.
On the other side of the fence, nVidia recently announced its Fermi architecture, more known as architecture that will end in GT300/NV70/GF100 chips. The cGPU based on Fermi architecture features 1MB of L1 cache and 768KB L2 cache. One megabyte of L1 cache is more than any of the higher-volume CPUs that are currently being shipped, just look at the CPUs below:
- AMD Quad-Core Shanghai = 512KB L1 [64KB Instruction + 64KB Data per core]
- AMD Sexa-Core Shanghai = 640KB L1 [64KB Instruction + 64KB Data per core]
- Intel Quad-Core Nehalem = 256KB L1 [32KB Instruction + 32KB Data per core]
- Intel Sexa-Core Dunnington = 96KB L1 [16KB Instruction + 16KB Data per core]
- Intel Octal-Core Nehalem-EX = 512KB L1 [32KB Instruction + 32KB Data per core]
Furthermore, nVidia features a cluster of 32 Fused Multiply-Add capable cores capable of handling Integer or Floating-Point instruction. In comparison, Intel will support Fused Multiply-Add [FMA] with Larrabee as a cGPU and 2012 Haswell architecture as a CPU.
Now, does this sound easy to make? If it was easy, we would not go from almost 80 GPU companies at the beginning of 21st century to eight [ATI, ARM, Imagination Technologies, nVidia, S3, SiS, Matrox], with only two making discrete products in serious volumes on desktop and notebook segment, and two licensing very serious volumes in handheld business. Even though it owns 50% of the world-wide graphics market, again – we cannot consider Intel to be customer-oriented "GPU" vendor, given the performance of their parts. Just ask Microsoft how many waivers Intel hardware has in their certification process [hint: the number is higher than nVidia’s and ATI’s worst non-compliant hardware combined]. We know the number and sure thing is, it ain’t pretty.
Thus, Intel knew what the company has to do – or risk becoming a dinosaur in increasingly visual world. Now, the company knew the road to Larrabee would be difficult. The only problem is that Intel’s old-school thinking underestimated the size of the task at hand and time that it will take to complete such a project.
AMD’s reaction: We’ll merge with nVidia – a marriage that never happened
AMD’s reaction to Intel’s split to CPU and cGPU and future fusion parts was quite simple: Hector J. Ruiz and his executive team began to discuss a merger with nVidia, which ultimately fell through in the second half of
2005. AMD knew nVidia’s roadmaps just like nVidia knew AMD’s, thanks to the now-defunct SNAP [Strategic nVidia AMD Partnership], formed in order to get the contract for the first Xbox. A few weeks after those negotiations went under [Jen-Hsun?s major and unbreakable requirement was a CEO position, which Hector refused], AMD started to talk about the acquisition of ATI which ultimately became a reality a few months down the line [July 2006].
If this merger went through, there is little doubt that today ION chipset would be MCM [Multi-Chip Module] and the world of netbooks would probably look a whole lot different [remember whose hardware was inside the world’s first netbook?]. But AMD went with a less aggressive company and the strategy is paying off now.
The real question of did AMD overpay for ATI Technologies Inc. can only be concluded once that the cost of Intel building Larrabee on its own becomes a matter of public knowledge. Over the past few years, we heard several different calculations with almost each and every one being well over a billion dollars. Worst case that we heard was "we burned through three billion USD", but that belongs in the speculation category. Do bear in mind that the figures aren’t coming from the bean counters and that the cost of slippage cannot be calculated yet.
Intel Larrabee specifications
One of early Intel Larrabee PCB board layouts – not much has changed on the current prototypes
If you are wondering what Larrabee’s specifications are, the project’s goal was manufacturing a chip with 16 cores and 2-4MB of L2 cache clocked at 2 GHz, all packed in to a 150W power envelope [for the chip alone]. The chip was supposed to be manufactured in the well proven and paid-out 45nm process technology. In fact, the only thing that keeps the bean counting dogs away is the fact that Larrabee will be manufactured in 45nm with all the factory investment paid out, as the company has more wafers than some wafer suppliers. Still, our sources pitched the cost of a single chip [with packaging] at around $80 per chip.
The memory controller resembled ATI’s R600: "1024-bit" internal ring-bus [two way 512-bit] with over 1TB/s of bandwidth connecting to eight 64-bit memory controllers [512-bit] that control 1GB of GDDR5 memory clocked at 1.0 GHz [4.0 GT/s], for an external bandwidth of 256GB/s. As it turned out, the available bandwidth of video memory turned into a problem later in the development.
Looking at the execution core itself, it was in-order x86 [not going into tech overdrive, we can tell you that the principle is vaguely "similar" to one Intel uses in Atom CPU] capable of handling four threads on-the-fly [Nehalem architecture supports two threads – Hyper-Threading]. This was a bastard child of the improved Pentium MMX core [P55c], but laid in "in-order execution" fashion. 16-wide Vector SIMD [Single Instruction Multiple Data] unit carrying the heaviest burden ? it is capable of handling 16 32-bit [512-bit] Advanced Vector Extensions. In order to have everything working properly and avoid starvation, the cores were interconnected with the aforementioned ring-bus controller. Internally, Larrabee core features 64KB of L1 cache [32KB Data, 32KB Instruction] just like Nehalem processors. As performance simulations commenced, it was clear that 16 x86 cores with AVX extensions would not attain the performance needed to reach projected GPUs from ATI and nVidia in 2008-2009, thus the roadmap was expanded with 24, 32 and 48 core parts and the L2 cache was kept at 256KB per core. In case of 32 cores, Larrabee cGPU should have 8MB of cache, while the 48 core version should have 12MB. The 24 core version would still have physical 8MB of cache, but it remains to be seen will that mean the 8MB is accessible to all 24 cores or does every core keeps it’s 256KB and "that’s that". According to the sources we spoke with, the "size of L2 cache is not an issue. We are the industry benchmark for SRAM cache density and we can put as much cache as we want. Nobody can touch us [in that perspective] and everybody knows that."
This is the plan for Intel Larrabee on the HPC side: CPU with mostly LRB cores for capturing HPC contracts.
To disclose some figures, Intel was targeting 1TFLOPS of computing power with 16 cores, but was unable to reach that target, hence the 50% increase in units for the baseline part. The 24 and 32 core parts were planned to use the 45nm process node, while the 48-core version would come out in 32nm. Please do note that the 24 core version was actually the 32-core version, but with some faulty cores. The 8 and 16 core parts would be integrated inside the CPU as a part of tock version of Haswell architecture [follow-up to Sandy Bridge and Ivy Bridge – Sandy Bridge would have the final GMA-based part, swapping for Larrabee core in a 22nm refresh] and "AMD would be done".
That was the plan. Now, let’s see where the Larrabee train derailed and where it is right now.
Things going wrong with Larrabee? in 2007?
We got the first warning flag back in 2006, discussing the matter with people in different groups who were working on Larrabee. We received e-mail with quotes such as "the group in Braunschweig, Germany are seriously a bunch of dip-shits with undeserved chips on their shoulder! It’s really a corrupt corporate culture in this office."
Now, if this was one disgruntled engineer, it would be natural for us to discard the information on the claims of bias. But as the time went by, we received bits and pieces with more worrying content. Back in 2007, when Larrabee started to take physical shape, we heard some very worrying statements coming from the people that were coming and going from the team. Most worrisome was the issue with the memory controller – "people involved in designing high performance memory controllers don’t even understand the basic concepts of pipelining and they don’t understand how to read a memory spec. It is completely ridiculous."
Given the technological changes with the GDDR5 memory, which was selected for Larrabee back then, we knew that the memory controller had to be seriously improved upon compared to controllers that support GDDR3 or GDDR4 memory. As I wrote on my blog ages long time ago, GDDR5 memory is really "a different cookie" than its predecessors. While GDDR3 and GDDR4 were based on the DDR2 memory standard, GDDR5 took a different route and only shares a few similarities with the conventional DDR3 memory.
Additionally, we were constantly meeting with various people in the project Larrabee and projects around Larrabee. Traveling long distance on the same flights as Larrabee engineers also helped a great deal. As engineers were trekking on the traditional USA-Germany-India route, it was interesting hearing their thoughts and concerns. In fact, concerns were commonplace. At that time, Intel was under severe stress from all the investigations going against the company and people that were contracted started to feel uneasy. For instance, "the only thing that silences contractors is the absurd salaries they are paying for the services of being scapegoats for the German Intel Employees – it’s something like 150K-200k/year, with German tax evasion services included in the contract deal courtesy of www.internationaltaxsolutions.com." When we received this e-mail, we contacted several [both former and present] contractors at Intel. Upon showing this e-mail to the persons that were in the know, we got confirmation that such deals were in place and were even shown several contracts from the US. Needless to say, the situation got really complicated.
This was all followed by various e-mail and vocal conversations with Intel’s engineers that duly believed their baby will see the light of the day. However, any talk about performance was quickly muted, with quotes such as "we have to get our baby out the door. First generation is building ecosystem anyways?" "We need to learn from our mistakes on first generation and create a world beating Gen2". Another source debated that "1st gen is proof of concept. 2nd gen is fixing that proof. 3rd gen is the sh*t."
Naturally, in this stage of development, we were told to expect the chip sometime in early 2010. However, in a conversation that took place at IDF 2007 in San Francisco, a certain Intel executive chit-chatted with several journalists and analysts and was caught saying "In May 2009, we’ll ship a processor that will have one dual-core CPU die in 32nm and Larrabee die in 45nm." When we heard that and asked our sources for clarification, there was a chain series of e-mails coming from Larrabee teams and quite frankly, for the first time in years, we saw something that can only be described as a panic. Our olive-branch was that maybe the exec was wrong by a year, but that was also negated to us with a "Theo, that’s 2012." statement. That unfortunate statement was one of reasons for numerous leaks about what’s going on with Larrabee.
Getting back on Larrabee, at one point in the last 18 months, the engineers saw that the dual 512-bit ring-bus couldn’t scale with the number of cores and one of plans was to implement multiple paths, with each ring-bus taking care of 8-16 cores. That was a disastrous idea from day one, and good thing that it didn’t take off. But struggle with elementary elements of the chip so late in the process gives you that "this is when the s*** hit the fan" feeling. The 32-core silicon was [and still is] prone to numerous issues, such as cache coherency issues, cores were starving for instructions and overall, Larrabee looked like a mess. Bear in mind, nowhere as near as much of a mess as some GPUs in the past from both ATI and nVidia [starved nVidia NV30 and ATI R600 come to mind].
We were hearing what was going wrong all the time, but whenever we heard that, it was almost 100% coming from ATI or nVidia, so naturally we would dismiss the accusation as typical FUD. But that was nothing for what happened after.
"GPU is dead" and the war with nVidia
In the spring of last year, a certain Intel engineer stated that the "GPU is dead", a statement which was reiterated by now former Intel exec, Pat Gelsinger at IDF Spring 2008, happening in the city of Shanghai. Pat took charge of Larrabee and was certain that this architecture was the future of Intel. We agree 100% with Pat that the future of not just Intel, but AMD as well, as Larrabee is a merger between the CPU and the GPU. This applies for nVidia as well, but that’s another topic.
Now former Intel VP Pat Gelsinger showcasing Larrabee wafer a year after infamous IDF China keynote
The only problem is, Intel sparked a war with nVidia without even having working silicon [ok, silicon capable of displaying a picture]. And that was a big mistake. The moment Jen-Hsun saw the comments made by Intel engineers and later statements by Intel execs at IDF Spring 2008 in Shanghai, Jen-Hsun "opened a can of whoop-ass" on Intel. Luckily for Intel, Jen-Hsun didn’t have the GT300 silicon either, but GT200 was at the gates.
The strained relationship between the two got into a state of war when Intel started talking to OEMs and claiming that nVidia does not have the right to create chipsets for Nehalem [QPI – Quick Path Interface] and Lynnfield [DMI – Digital Multimedia Interface]. Upon request, we were shown a cross-license deal between Intel and nVidia. I am not going to disclose which side showed it to me, since technically – the source did something it wasn’t supposed to do.
The wording in the original document, as far as my laic understanding, does not bar nVidia from making chipsets for Intel even after Front Side Bus is dead, because both QPI and DMI qualify as a "processor interconnect", regardless of what either party is saying.
Intel filed a suit against nVidia in Delaware court [naturally, since both companies are incorporated in the "Venture Capital of the World" state], claiming that nVidia doesn’t hold the license for CPUs that have integrated memory controller. nVidia didn’t stand back, but pulled a counter-suit, but this time around, nVidia wanted the cross-license deal annulled and to stop Intel from shipping products that use nVidia patents.
If you wonder why this cross-license agreement is of key importance for Larrabee, the reason is simple: without nVidia patents, there is no Larrabee. There are no integrated chipsets either, since they would infringe nVidia’s patents as well. Yes, you’ve read that correctly. The Larrabee architecture uses some patents from both ATI and nVidia, just like every graphics chip in the industry. You cannot invent a chip without infringing on patents set by other companies, thus everything is handled in a civil matter – with agreements. We heard a figure of around several dozen patents, touching Larrabee from the way how frame buffer is created to the "deep dive" called memory controller. If you end up in court, that means you pulled a very wrong move, or the pursuing company is out to get you. If a judge would side with nVidia, Larrabee could not come to market and well can you say – Houston, we have a problem?
Intel had the luck
of AMD snatching ATI – Intel and AMD have a cross-license agreement that allows for technology to transfer both ways – Intel had no problems getting a license for Yamhill i.e. AMD64, 64-bit extensions for their CPU architecture and equally should have no issues of using ATI patent portfolio [ATI and Intel already had an agreement]. My personal two cents would be going on Intel giving an x86 license to nVidia in exchange for cross-license patent, but only time will tell how the situation will develop. However, there is a reason why Bruce Sewell "retired" from arguably the best or second best legal post in the industry [IBM or Intel, we’ll leave you to pick] and then show up at Apple two days after that "retiring" e-mail.
All that this unnecessary war created was additional pressure on engineers, who had to check and re-check their every move with Larrabee, causing further delays to the program. We completely understand these people – these chips are their babies. But the additional legal pressure caused some people to leave. This is nothing weird – with projects of this size, people come and go.
But Larrabee was coming, and it was coming without the "it works" component. Plan B: Intel continuously increasing share in Imagination Technologies
If you followed financial transactions in the past 18 months, you could witness an interesting trend: both Apple and Intel started to increase their share in Imagination Technologies. When it comes to Apple, we are not surprised, given that their bread and butter [iPhone, iPod] use PowerVR graphics technology. But for Intel, a company that is investing almost unimaginable amounts of money in creating an x86 GPU, a continuous increase of share in Imagination Technologies and constant people transfers – just seemed a bit odd to us. Yes, it is true that it may be just that Intel does not want Apple to take over the company or increase the share to get significant voting rights, but again, that can only be speculated upon.
Intel didn’t increase its share in VIA before it started an all-out war with them back in the early 2000s and the company definitely isn’t buying shares in nVidia. So, why would Intel increase ownership in its future rival if the company is so sure of its success with the Larrabee project?
On one side, one might argue this is to limit ARM’s deadly combo of ARM core and PowerVR SGX graphics core, but if you’re Intel – where’s the Chipzilla confidence? Is it possible that Intel is afraid of the efficiency and performance set by Series5 and we won’t even go into PowerVR’s Series6, set to debut next year.
We are going to leave you to conclude that one. But just bear in mind that Sandy Bridge [32nm, octa-core], the new microarchitecture that will succeed 45nm tick Nehalem and its 32nm tock Westmere – uses iGFX, a graphics subsystem featuring the same Intel tech [beside already mentioned nVidia patents, GMA relies on Imagination Technologies IP portfolio]. For instance, the Intel Atom platform is consists of an Atom CPU and the 945GMC chipset that features a PowerVR SGX 535 graphics core.
Intel’s IDF Larrabee demo
During IDF Fall 2009 held few weeks ago, Intel showed Larrabee in working form for the first time in history. During early 2009, we saw Larrabee as a wafer in the hands of Patrick P. Gelsinger, but no working parts. At IDF Fall 2009 in San Francisco, the system was shown running Enemy Territory: Quake Wars on Larrabee silicon. But there was no Pat hosting the tech part of the keynote.
There is a big but – that version wasn’t the standard Enemy Territory: Quake Wars you can buy in stores, but rather a raytraced version that was demonstrated last year while running on a 16-CPU core Tigerton system. Raytraced ET: Quake Wars is a project by Daniel Pohl [Quake 3 and 4 Raytraced guy] and the team over at Intel Santa Clara. The "baby Larrabee", as Intel engineers love to call it made its baby steps running in lower framerates than a 16-Core Tigerton system from Research@Intel Day 2008, and ran in somewhat a lower resolution. But this was expected with such early silicon.
Intel Larrabee prototype board at IDF Fall 2009. Not much has changed from the original PCB scheme
For a chip that promises DirectX 11 and OpenGL compatibility, seeing a CPU demo wasn’t exactly a reason to go and jump through the hoops. The demo nicely showed that Larrabee is a chip made out of CPU cores with AVX support that can run CPU code. This demo was not an example how Larrabee silicon runs. The Larrabee IDF demo was all about the software team at Intel making a compiler that makes game code run on CPU and the GPU at the same time ? and that’s a testament of dedication software team has.
However, there is a mountain to climb, since this is valid for new applications only. Intel knows the company cannot sell the product without the support for OpenGL, OpenCL and DirectX. Building 100 or so millions of lines of code for the driver part is a herculean task, but the sources at hand claim that they are working on target – and warn that the hype was simply brought too soon. Contrary to belief, the IDF 2009 demo was not a major milestone for the Larrabee hardware team. They can’t solve some of the hardware issues and those issues will require a massive re-work. Larrabee teams have to deliver silicon that won’t sit on 32nm sexa-core Westmere processor and run a single app. The "Larrabee for Graphics" story waits on the software development team, and that is where Larrabee will make  or break [into 2011].
The demo didn’t impress the audience – prior to IDF, back at AMD’s Evergreen event, we heard comments coming from industry analysts that didn’t have a lot of nice words in store for the project LRB. But during nVidia?s GPU Technology Conference [held a week after Intel’s demo], it was described as "shameful", "they don’t have a promised shipping product", to "they failed to execute and now AMD and especially nVidia are putting CPU [functionality] into a GPU… that’s the only way to go". Under the condition of anonymity, we received the following quote from a well respected analyst: "Intel failed to deliver on what they promised. Larrabee is in no shipping state. Intel ventured into the low-ASP area with Atom and now will start to being pressured by ARM on the low end. Intel’s biggest mistake was stepping on ARM’s turf – they have to fight Samsung, TI, Qualcomm and now nVidia".
But like we already wrote, IDF demo was more demonstration of Intel’s brilliant compiler, rather than a Larrabee silicon one. If silicon was the only important part, we could call this a multi-billion dollar fail.
Embarrassing parallels: Intel Larrabee vs. Boeing 787
Now, we’re not certain exactly what is going on with American companies and multi-billion dollar projects, but there is an unflattering comparison between two large
multi-billion projects that were supposed to revolutionize their respective industries: The Boeing 787 and Intel’s Larrabee. What we found weird are the similarities between the two:
- Executives hype up the product and claim that the competition is "done"
- Analysts and biased press feed on given hype and praise the company, citing that "the competitors are dead"
- Lower-level employees start dissing the competition on Internet forums
- Executives "launch" the empty shell with engines attached/wafer with dead chips on it
- Internal roadmaps start moving the part deep into the future
- Executives stop talking about the project
- Engineers from different groups start publicly discussing that the thing is going haywire
- Roadmaps are moving even further in the future
- Executives start pretending that the project doesn’t exist
- Engineers start to leave the company and talk trash about their own project
- Key execs get the chop
- Demo is staged just for public [ZA001 taking several taxis / Larrabee RT demo]
- Multiple revisions of projects pump up the cost
- The project cost is measured in billions
So far, a happy-end for both projects is yet to be written. One might argue that both cases are a school examples of over-promising and under-delivering, since the 787 was supposed to fly revenue service for the past 18 months [the test plane still hasn’t left the ground i.e. a 30+ month delay], and journalists were supposed to review the cards by this time. According to a statement dated some time ago we were in fact, supposed to have Fusion 32nm CPU+ 45nm GPU chips at this point in time. Bear in mind that this was the reason why AMD panicked and paid 5.9 billion [5.4M initial plus additional 550 million for patents and so on] for ATI Technologies, all in order to be just one year late behind Intel’s own fusion CPUs.
Personally, I don’t agree with "over-promise and under-deliver", because it is an over-simplification of the projects at hand. It is not easy to create a chip that has 10,000 transistors, yet alone in excess of two billion transistors. Given all of the problems that the almighty Intel [we’re not being sarcastic here, Intel is truly a behemoth of the world, not just IT industry] experienced with Larrabee, each and every member of the IT industry should think twice before stating that creating a "chip that can do graphics" is something easy. You have to have key people and more importantly, teams that work together for years and have experience? and even that does not warrant success. If you go back to 2005, you remember that ATI had a three month delay on ATI R520 Fudo [Radeon X1K series] because of a single stupid bug in the silicon, or a nine month delay of R600 [Radeon 2900].
However, given the success and reliability of the Airbus A380, the strength nVidia pulled from NV30 fiasco, ATI pulled from R600 – Eric Demers [AMD Fellow and "CTO (gfx)"] told us "If R600 never happened, we would not see the light and changed our ways". In the case of nVidia, the NV30 fiasco gave birth to the GeForce 6800, 7800 and PlayStation 3 GPU, ATI is now kicking ass with the Evergreen family and we can say that both Boeing and Intel will deliver sooner or later. Everything else is empty talk.
Where is Larrabee today?
The question you’re probably asking yourself is "Is Larrabee dead?" and the answer we can give you is a flat out – NO. Intel knows that the future of the company is at stake: if AMD successfully fuses its CPU technology with ATI’s GPU technology and if nVidia implements an ARM core inside the GPU, Intel has to have an answer to that.
Too much money and human resources are invested in Larrabee to just let it go. Instead, Intel moved the best of the best, so called "Champions of Intel" into the Larrabee group. The "CoI" are actually engineers from various projects such as Nehalem [not exactly hard thing to do, since both Nehalem and Larrabee teams are located in Oregon, plus there is always an Intel Express, daily plane shuttle between CA and OR]. Now, we are not sure what brilliant CPU engineers could do in the creation of Larrabee, but according to the sources, working with Dadi Perlmutter is a very positive experience. Some of the sources we talked to sources didn’t have kind words for Pat Gelsinger or Sean Maloney, though. Given that more than two sources told us that, we believe this attitude is present in more places than Hillsborough, Folsom, Braunschweig or Indian locations.
According to the information we have, Larrabee is at this moment – "13-18 months out", putting the launch into 2011, rather than 2010. However, we do not believe this is a bad thing. If engineers had their way, Intel would probably keep their official events without mention of Larrabee until current year, and prepare the world for the arrival of discrete graphics product in 2011 and a "Haswell CPU + LRB GPU" in 2012. If the company launches in 2010, our sources doubt that the product will be anywhere but ready and you can expect that at least one reviewer could find an application that doesn’t work as expected. In that case, "optimizations" such as the recent Intel’s 3DMarkVantage one will not go lightly.
Recently, there were a slew of rumors that Larrabee is about to get cancelled, as Larrabee was allegedly "Gen1", "Gen2,", "Gen3", "Gen4" and so on ? taking into the account the amount of changes that happened to the silicon. In a way, former Intel engineers that joined competing companies have every right to say that Larrabee went through multiple revisions of silicon and call them "generations". B silicon is significantly different than A silicon. However, given that the in-order core is still unchanged and that the problems were in the field of filling and syncing those cores up, we would say that Larrabee is still first gen, but that is not the silicon that will come out. Intel doesn’t have issues such as AMD or nVidia, since the company owns 45nm Fabs and can do whatever they need to have the silicon working like a clockwork
Now, we’ve dismissed the generation 3-4 rumor, and yet stated that the current silicon will never come to market. Here’s a big one – Intel knows that they’ve completely messed up with the current generation and the company decided to trash away the part of current design and re-design the SIMD units from ground up. This means the current diagrams featuring 16-wide SIMD unit are out the door, because that design doesn’t work. Newly designed SIMD units will still probably be 16-wide and take AVX instructions, but it will be GPU-like and not CPU-like. Unfortunately, we cannot disclose any more details on the changes at hand, but we were told that the architectural changes are mandatory. The original Larrabee design cannot be described as anything better than a very, very expensive learning curve and it is up to the "second Larrabee" to become the first generation to come to market.
There are two ways that Larrabee can take, no middle ground is allowed. What can happen is that engineers fix the underlying issues and that Larra
bee becomes what it was supposed to be, or it could turn into a very costly mistake, which would not be fixable simply by acquiring Imagination Technologies. Bear in mind that if Intel had acquired ATI Technologies, AMD would be cornered, just like nVidia.
If Larrabee fails to impress, no harm done – Core 2 architecture earned [and still is earning?] billions of dollars, so even a 3-4 billion dollar "mistake" could be easily forgotten, though. Imagination Technologies PowerVR core at 4 GHz could be powerful enough to start competing with mainstream parts form ATI and nVidia. However, somehow we doubt that Intel’s minds would allow for PowerVR IP to work on equal teams, and if that certain VMware subsidiary continues to create crippled drivers "as ordered", Intel could find themselves with a nVidia Tegra chip beating the graphics performance of Sandy Bridge CPU.
Intel as a company is an undisputed technology leader and even though it was known to kill unsuccessful projects in the past, this bid is the future of the company. With Larrabee, it will be easy to expand the CPU portion of Larrabee and try to persuade the world that x86 architecture is here to stay, from ARM-competing parts to high-performance computing. If not, Intel may be big, but ARM has grand plans for entering the low-ASP area and that could mean the beginning of a war for margins. Do bear in mind that ARM IP was in over four billion chips in 2008, so leader in installed user base is ARM, not Intel or AMD. On the high-end, nVidia leads the parallel world and now has the support from fat governmental contracts such as 10+ PFLOPS Oak Ridge setup. With the announcement that Australians will go for 1 Exa-FLOPS machine by 2020 and currently are building software on nVidia GeForce and Tesla cards, Larrabee has potential to enter every market – from a cellphone to a supercomputer. This one is too big to be missed, folks.
Every project has its challenges, and Larrabee has definitely had its fair share. If Intel kept a cool head and didn’t start pre-announcing the architecture, causing all the legal and engineering pain, this product would probably be welcomed by analysts, press, and the like. Instead, we currently have a Boeing 787 in the form of silicon. The Larrabee is going through significant changes in the architecture and only time [and money] will tell what will happen with the project.
Will marketing and sales cause more short fuses in engineering team or will the execs leave engineers in peace to "create the most complex part Intel has ever created"? What happens from now on will define Intel as a company. We have a very exciting 12-18 months ahead of us.
Again, our message to Intel’s executive team is very simple: Leave Larrabee teams alone. You’ve done enough.