Update: We have more information on the source of the bug.

In our Sandy Bridge review I pointed out that Intel was unfortunately very conservative in one area of the platform: its chipset. Although the 6-series chipset finally brought native 6Gbps SATA to Intel platforms it failed to fix issues with 23.976 fps video playback. Intel also failed to deliver a chipset that can support SNB's processor graphics as well as overclocking. Today, things just got even more disappointing.

Intel just announced that it has identified a bug in the 6-series chipset, specifically in its SATA controller. Intel states that "In some cases, the Serial-ATA (SATA) ports within the chipsets may degrade over time, potentially impacting the performance or functionality of SATA-linked devices such as hard disk drives and DVD-drives.".

The fix requires new hardware, which means you will have to exchange your motherboard for a new one. Intel hasn't posted any instructions on how the recall will be handled other than to contact Intel via its support page or contact the manufacturer of your hardware directly. In speaking with motherboard manufacturers it seems they are as surprised by this as I am. 

Intel will begin shipping the fixed version of the chipset in late February.  The recall will reduce Intel's revenue by around $300 million and cost around $700 million to completely repair and replace affected systems.

It Was All Good Just a Week Ago

Here’s the timeline.

Intel has been testing its 6-series chipset for months now. The chipset passed all of its internal qualification tests as well as all of the OEM qualification tests. These are the same tests that all Intel chipsets must go through, testing things like functionality, reliability and behavior at various conditions (high temps, load temps, high voltage, low voltage, etc...). The chipset made it through all of these tests just fine.

There are two general types of problems you run into in chip manufacturing. The first is an engineering oversight: functional problems that will cause a failure during your validation tests. You get these from not giving your engineers enough sleep and they design a circuit that logically or functionally has a problem in it. These were the type of errors that resulted in NVIDIA’s Fermi delay for example.

The second type of problem is more annoying, it’s a bug of a statistical nature. In these situations the problem doesn’t appear on every chip in every situation, but on every nth chip out of every n x somereallybignumber chips. When a bug doesn’t present itself in small quantities, it’s very difficult to track down. This is the nature of the 6-series chipset bug and it’s also why the problem didn’t appear sooner.

Intel mentioned that after it had built over 100,000 chipsets it started to get some complaints from its customers about failures. Early last week Intel duplicated and confirmed the failure in house.

Intel put together a team of engineers to discover the source of the problem. Based on the timeline it looks like it took them a couple of days to figure it out. Intel then spent a few more days trying to understand the implications of the issue. Finally, late last night, Intel decided the only course of action would be a recall and it halted production of its 6-series chipsets.

Recalls are never fun to do. If you don’t have a replacement product in the market it means that your sales come to a halt. You also have to deal with actually recalling all of the faulty hardware, which costs a lot of money. Intel expects it’ll cost $700M to actually recall and fix hardware in the market today and another $300M of lost revenue for the chipset business while this is all happening. Altogether we’re talking about a billion dollar penalty. It’s like Intel’s version of the RRoD [Ed: Microsoft's Xbox 360 Red Ring of Death], but without the years of denial.

The Failure Manifested

I asked Intel how we’d know if we had a failure on our hands. The symptoms are pretty simple to check for. Intel says you’d see an increase in bit error rates on a SATA link over time. Transfers will retry if there is an error but eventually, if the error rate is high enough, you’ll see reduced performance as the controller spends more time retrying than it does sending actual data.

Ultimately you could see a full disconnect - your SATA drive(s) would not longer be visible at POST or you’d see a drive letter disappear in Windows.

It’s Limited to 3Gbps Ports Only

Interestingly enough the problem doesn’t affect ports 0 & 1 on the 6-series chipset. Remember that Intel has two 6Gbps ports and four 3Gbps ports on P67/H67, only the latter four are impacted by this problem.


Intel's DP67BG—The blue SATA ports on the right are 6Gbps, the black ones are 3Gbps. The Blue ports are unaffected by the bug.

If you’re a current Sandy Bridge user and want to be sure you don’t have any problems until you can get replacement hardware, stick to using the 6Gbps ports on your board (which should be the first two ports).

The Fix

The SATA bug exists in hardware and there’s no way to provide a driver or firmware update that can fix it. The fix requires a metal layer change, which will result in a new hardware stepping (resulting in the ~3 week delay before replacement hardware is ready).

What About Current Sandy Bridge Owners?

On its conference call to discuss the issue, Intel told me that it hasn’t been made aware of a single failure seen by end users. Intel expects that over 3 years of use it would see a failure rate of approximately 5 - 15% depending on usage model. Remember this problem isn’t a functional issue but rather one of those nasty statistical issues, so by nature it should take time to show up in large numbers (at the same time there should still be some very isolated incidents of failure early on).

Intel has already halted production of its 6-series chipsets and will begin shipping fixed versions of the chipset in late February. You can expect motherboard shortages through March at least. Intel hopes to be able to meet demand by April.

Currently Intel says the best course of action is to contact its support team for information on replacement, although I’m guessing once the fixed chipsets are available we’ll have replacement plans from all of the motherboard manufacturers.

No One is Happy About This

Given the timeline of discovery, Intel didn’t delay getting around to actually letting its partners know about the problem. On the bright side that means Intel and its partners weren’t plotting against its end users to sweep this under the rug. On the down side, OEMs and motherboard manufacturers can’t be happy about this - they woke up to news of the recall at the same time you and I did.

Intel is starting to have discussions with OEMs today (yep, you read that right) on how to handle the recall and when fixed hardware will be available. This is the reason why there’s currently no official, public recall plan in place. In the coming weeks I expect this to get ironed out but for now the best advice I can give existing SNB owners is to use their 6Gbps ports and wait.

Update: Intel informs me that it was legally bound to make the recall disclosure public before informing its OEM customers in this case. This is apparently a regulatory requirement.

Z68 Schedule Unaffected

I asked Intel if 6-series derivative chipsets were affected by the problem, specifically the Z68 chipset. While all 6-series chipsets are exposed to the issue, the launch schedule for all future derivates remains unaffected. Z68 will continue to launch somewhere in Q2 2011. I would expect to see motherboards around April.

At CES I spoke with Intel at length about the frustrating nature of the P67/H67 feature segmentation. The fact that there's no chipset that will let you use Intel's processor graphics and overclock your CPU is a major oversight. Intel's Z68 chipset will address this shortcoming, as well as add additional features (e.g. SSD caching) that are exclusive to Z68. I am disappointed that Intel wasn't better prepared on the chipset side at the SNB launch and today's announcement is icing on the cake. If you're going to have to wait to buy anyway I would recommend waiting until Z68 motherboards hit the market.

For existing owners, I would hope that Intel and the motherboard manufacturers might offer some sort of a trade-up or trade-in program to move to Z68 since you're going to have to replace your motherboard anyway.

Wait Longer for those MacBook Pros

Have you noticed a lack of dual-core Sandy Bridge based notebooks on the market? Intel wanted to but couldn’t launch every last SNB SKU at the same time, so the dual-core notebooks got pushed out until mid-to-late February. Unfortunately, that was pre-bug. With this latest delay you shouldn’t expect dual-core SNB notebooks until a few weeks after their original launch date, at the earliest.

If we assume fixed chipsets are available in the last week of February, they can be put into systems the first week of March. Then expect at least a week of testing and validation if not more. Add another week to ramp up production and we’re looking at late March or early April for dual-core SNB notebooks. Those of you waiting on Apple’s updated MacBook Pros fall into this category. I’d say April is a safe bet if you’re waiting on an upgrade.

POST A COMMENT

162 Comments

View All Comments

  • ghitz - Tuesday, February 1, 2011 - link

    A handful? All of them. Reply
  • wolfman3k5 - Tuesday, February 1, 2011 - link

    It doesn't matter how I try to spin this in my head, I just can't eat this crap out. The positive spin that Anandtech is trying o push is also laughable at best: "wait for the Z68, a great chipset...". Aham... the P67 was supposed to be good as well. I mean, all new AMD 8xx series chipsets have 6 SATA 6Gb/s ports by default that work perfectly fine, yet Intel had to give us only two SATA 6Gb/s ports, and four SATA 3Gb/s ports. Aren't you getting a little greedy Mr. Intel? I mean, I don't know what to call it? Divine Intervention or Karma? But you're loosing about 1 Billion Dollars Mr. Intel on this fiasco. It is 2011, and a hardware bug like this is simply unacceptable. If I didn't know any better I would say that shit happens, and something more sophisticated like say Cache issues or memory controller issues, or some sort of stupid complex bug, would be acceptable. But SATA port issues? And if I'm reading right it's all because of the manufacturing process!!! I mean, Mr. Intel, how fucking greedy are you mate? Read my lips: GET IT RIGHT THE FIRST FUCKING TIME! Honest to God! I swear, I don't give two donkey craps and 5 cents for the replacements that are coming out! I mean, this is the question that all of you that have bought into Sandy Bridge (myself included) should be asking: is Intel's colossal fuck up worth the time that you'll be spending disassembling your computer, going to the post office, waiting for about two weeks on a replacement, then reassembling your computer with the new board (and God forbid, maybe during shipping the replacement got screwed u), then re-installing Windows (hopefully it won't be necessary, but most people will have to anyway). Hey Mr. Intel, are you feeling my anger, mate? I mean, I'm pissed, so pissed that instead of doing all the work that I've just described, I'm switching to your competitor. I got a Phenom II X6 1100t + a Gigabyte 890FX on the way from NewEgg. By the way, the Phenom II X6 1090t is $199.99 now, and the Phenom II X6 1100t is $239.99. Sweet prices! I just wanted to enjoy my workstation with Sandy Bridge, and do my software development, and use all of my 6 hard drives + two Blu Ray burners... That's all I wanted. And I need this system to be bullet proof reliable. I hope that FOR CRYING OUT LOUD an Intel rep. will talk about this issue to us, the fricking enthusiast community! Reply
  • ghitz - Tuesday, February 1, 2011 - link

    Agreed! It seems to these titans that our time is worth s***t. Reply
  • ghitz - Tuesday, February 1, 2011 - link

    I did RMA all my stuff back to the egg. Reply
  • ClagMaster - Tuesday, February 1, 2011 - link

    In the meantime, I expect there will be an opportunity to get a discounted Sandybridge until some P67/H67 become available with the fixed chipset.

    I agree, this kind of bug stinks.

    If you think this bug is bad, do you remember the bug the first Phenom processors had. Required a BIOS fix and operated at slower speed than it should.

    AMD did not recall these processors. Intel, however, is recalling their bad chipsets.
    Reply
  • ghitz - Tuesday, February 1, 2011 - link

    Very true, probably AMD could have been in a really bad financial situation if they initiated a full recall, on the other hand Intel has the resources. Reply
  • ClagMaster - Tuesday, February 1, 2011 - link

    Having Intel confess the failings of the P67/H67 after investigating this bug, then offering replacement motherboards is a very professional act of an industry leader.

    Intel really does stand behind their products with a SOLID warranty.

    I was about ready to invest in a H67 motherboard but decided to wait awhile.

    I am not a first adopter because of similar problems I have experenced over the years. I prefer to wait until a Revision 2 or 3 motherboard is available before purchasing.
    Reply
  • peternelson - Tuesday, February 1, 2011 - link


    Prior to the official announcement of Sandy Bridge, most people continued buying systems with last generation processors (salespeople understandably wanted to hit their holiday sales targets so would not mention the impending obsolescence even if they knew, to avoid losing the sale).

    As for myself I knew about Sandy Bridge but needed something now, with many pcie lanes for I/O, and saw the socket 2011 systems put back a quarter and still far away, so purchased an X58 based system, with the idea to build a small cluster of SNB for AVX math when they became available.

    Then came the official announcement and release of the new models with impressive benchmarks, reviews and news coverage.

    Now that the support chipset for SNB in is known to be faulty I don't think people will want to buy systems with inherently faulty motherboards (which have the related hassle of later swapout, maybe having to do without your entire computer while parts are replaced), and in the case of UK seller Scan Computers (likely others too), it has suspended sales of SNB boards and systems already until further notice.

    Until the silicon respin, putting the replacement chips on motherboards and distribution of those boards into the channel in volume (which I guess would take perhaps 2 months) this will likely almost put a halt on sales of performance systems as informed buyers hold off their purchases, and distributors don't want the hassle of processing returns of even more parts so will suspend sales.

    By that time the other new chipset Z68 supporting both onboard video AND overclocking will be almost with us and I can't see many people investing in the existing chipset when the more flexible one is so close, again delaying purchases. Even if fixed, the original two chipsets may have a tarnished reputation from these problems (could I be sure my supplier gave me a fixed respin one vs the original ones that could fail sometime in future?).

    Looking at the Intel stock price on Nasdaq, I don't see much of a drop in INTC stock yet.

    In my view the cost of this problem is not just the estimated replacement cost of 1 billion, but also the missing, lost or delayed sales of new processors and systems over the rest of quarter1.

    This affects not just shipments of the chipsets with the bug, but the new processors, since nobody has an error free board in which to run them until the fix is done and distributed. Rare possible sensible case to still buy one might be people who need to get coding apps that use AVX instructions and still hope to ship to their own release schedules, although such developers will be a very small minority and will need to avoid using the unreliable SATA ports. Even if all those delaying buyers eventually still go with Sandy Bridge (rather than consider waiting for AMD Bulldozer), the income from them will be late, resulting in interest charges or foregone earned interest, and delays in return on investment made in design and fabrication for these new generation processors and chipsets.

    I don't yet see any regulatory news announcement giving investors an update to revised Quarter 1 sales estimates that I would expect over something like this. So far this SATA bug news is in the early adopter tech community on sites like Anandtech, but most of the finance types and investors may be unaware. I would think Intel should be giving them some heads up about the impact on revenue streams. Once the story is disseminated I imagine the stock would take a temporary drop, but Intel does deserve credit for recognising and owning up to the problem quickly.

    I share Anand's conclusions about the puzzling segmentation of new chipset features and will wait for Z68. Combining this with the SATA bug discovery, I feel better about having bought into X58 as a step towards socket 2011.
    Reply
  • Zoolookuk - Tuesday, February 1, 2011 - link

    Meh - I'm still pretty stoked about my MBP with an i7 and a 512GB SSD... Sandybridge can wait a while before making it obsolete. Reply
  • JohnZoidberg - Wednesday, February 2, 2011 - link


    IS INTEL INSANE?
    (sorry for the caps, but this is insane)
    "The fix requires new hardware, which means you will have to exchange your motherboard for a new one. "
    "Interestingly enough the problem doesn’t affect ports 0 & 1 on the 6-series chipset. Remember that Intel has two 6Gbps ports and four 3Gbps ports on P67/H67, only the latter four are impacted by this problem."
    So why would they just don't give customers a choice.
    choice 1:
    25 or 50 $ because you can only use 2 ports
    choice2 :
    get 0 $ and a new MBO
    I mean WTF... They will recall all MBOs? And I would guestimate that 80+% of the users use 1 or 2 ports.
    Reply

Log in

Don't have an account? Sign up now