AMD, Companies, CPU, Hardware, Intel

Alleged Core i7 TLB issue is NOT a story at all

Earlier today, the world started to turn around news coming from a fellow website that claimed that ominous TLB-bug stroke Intel’s latest baby, Core i7 series. Transition Lookaside Buffer erratas/bugs are notorious and took financial and reputational tool from Intel and AMD in the past.

Hearing news about TLB bugs happening with Core i7 had the potential to become a story of the year, just like AMD lost huge chunk of market confidence 12 months ago with TLB-bug on Barcelona/Agena (Opteron/Phenom). Could it be that Nehalem architecture has a similar flaw?

Could it be that Nehalem (right) has the same issue as Phenom (on the left), Core 2 Duo (middle)?

Could it be that Nehalem (right) has the same issue as Phenom (on the left), Core 2 Duo (middle)?

Well, prior to running my story, I decided to read the document in question (Intel Core i7 Processor Extreme Edition Series and Intel Core i7 Processor Specification Update November 2008) from one side, and wait until Intel responds from another. While reading the document, it looked to me that the erratas are already fixed, and that the launch platform is actually unaffected by this problem.

Shortly after 10AM PT, I received an answer from Dan Snyder, Intel’s PR manager and CPU specialist. The answer is an official statement from Intel:

This is simply a pointer to a previous document written in April 2007.  This document is an application note (advises on programming techniques) that programmers have had since April of 2007.  This item in the Nehalem spec sheet is a web pointer, under the heading “spec clarification”.  The reporter who wrote this did not contact us and we will try to clarify this with him.

The story was not over here, I also received a detailed clarification over this “issue” that turned into a non-issue:

SPEC CLARIFICATION AAJ1 was initially added due to an issue on the Intel® Core 2 Duo processor which was previously corrected with a BIOS update; this issue does not impact the Nehalem Family of CPUs.  There are errata on the Intel® Core i7 processor that relate to the TLB.  These all relate to improper translations or error reporting, and all of those that impact functionality have been fixed via BIOS updates prior to Core i7 launch.

As you can read above, mentioned errors was “featured” in initial batch of processors with Conroe architecture (Core 2 Duo). Nehalem itself shipped with bugs (all processors do, that’s why micro-code update feature was implemented in the first place), but not with stability-challenging bug that plagued Core 2 Duo and Quad-Core Phenom/Opteron of yesteryear.

If this error was not solvable other than decreasing performance by castrating L3 cache bandwidth (like Barcelona/Phenom) or a product recall (like Pentium 100), Intel would have one heck of a disaster on their hands. But due to mechanisms implemented by both Intel and AMD (already mentioned micro-code update that is nothing else but a firmware flash for the CPU), small errors and bugs are easily squashed.

A storm in a cup of water, as my grandma would say.

  • John

    In other words, you’re saying that your friend and colleague, Fudo, posted a bogus story. Shocker.

  • Johnathan

    Thank you for the very clear and concise explanation of the false articles that have been floating around. I am coming from the EVGA forums where a thread was made about this much “supposed” issue with the i7.

    I have sent off an e-mail to Intel to get clarification but now I feel that it is not needed due to this well written article that lay before us.

    To make a long story short a moderator at EVGA “RussianHAXOR” pointed another member “AuDioFreaK39” to this article who intern pointed the rest of us over.

    Thank you for real, true and honest reporting.
    Johnathan (UserX)

  • Hi Jonathan,

    the target of this blog and upcoming site is to offer insight views’n’opinions, and this story was just that.

    I’ve read the whole document and saw references to an old document which resulted in that old story on the INQ. Woodcrest/Conroe suffered from instabilities and prior to launch, Intel lost the LLNB contract due to instabilities and higher reject rate. The consequence of that bug (and RAID controller bug) caused Intel to lose the “technology partner” status with GAO and subsequently, AMD walked right in with its CPUs, scoring major wins in US Supercomputing space.

    One nasty scaling benchmark from Lawrence Berkeley National Labs also killed off Woodcrest/Conroe architecture… and seeing that bug making reappearance in Core i7 didn’t make any sense.

    After reading the piece, I decided to wait for Intel’s answer, since the situation was confusing. This meant sitting on a story, but it is better to offer a good answer than a half-baked one.

    Not defending anyone, it is hard being editor-in-chief, and that is something I am taking a role starting in weeks from now. But I will stride to offer complete information, no matter what. And I will do the same to my colleagues which are joining the project.

    This was a neat story to break, but it was just too murky. If Nehalem was causing instabilities, that would be “story of the year”.

  • Pingback: 100 Top Posts WordPress English 3/12/2008 « Kopanakinews’s Weblog()

  • Thank you very much! very useful information, it is useful for my work on the Internet! +1

  • Jon Strabala

    Please look on Fudzilla 12/16/2008 posting with the title “Nehalem TLB is not completely fixed”. It would seem that is indeed “spinning” PR and hat the problem is real.

    “AAJ1. MCi_Status Overflow Bit May Be Incorrectly Set on a Single Instance of a DTLB Error
    “AAJ1 – No Fix”

    Of course the final impact e.g incorrect results etc. etc. is not documented at all.