PDA

View Full Version : ECC errors on DIMM 1/A of Mac Pro




PowerMike G5
Dec 24, 2006, 06:55 PM
Hey all,

I have 4GB (4x1GB) of RAM in my Mac Pro. I noticed right now that one of my sticks is saying (in about this Mac) that it has 9 ECC errors. The other 3 status's are ok.

I've never seen this before. Does anyone know what this means? My computer seems to be running fine at the moment with it showing ECC Errors in the status section of that one DIMM ...



Nermal
Dec 24, 2006, 06:56 PM
Someone else (CanadaRAM, I'm looking at you! :)) can probably confirm this, but I'm pretty sure it means that your RAM is failing. Is it Apple RAM or third-party?

electronbee
Dec 24, 2006, 07:02 PM
If your OS is detecting the ECC errors than you most definately have bad RAM. You can try and reseat it as sometimes that will help. You can also try and call a local computer store and see if they have a RAM checker. It's a little box that will run a bunch of error rate tests to verify that it is in fact bad. Sometimes, BestBuy has one of these guys but it is not every store.

Is your machine acting odd? Can you suspend/sleep it without any issues? Those are other identifying issues of bad RAM.

I have seen this issue with all brands of RAM. In reality, there are only so many manufacturers of the actual chips themselves. One brand is comparable to the other.

ChrisA
Dec 24, 2006, 07:27 PM
Hey all,

I have 4GB (4x1GB) of RAM in my Mac Pro. I noticed right now that one of my sticks is saying (in about this Mac) that it has 9 ECC errors. The other 3 status's are ok.

I've never seen this before. Does anyone know what this means? My computer seems to be running fine at the moment with it showing ECC Errors in the status section of that one DIMM ...

"ECC" stands for Error Correcting Code". This means there are extra bits used to store the data redundantly so that when there is an error the error can be corrected. So the system can continue running but you need to replacing the failed RAM before more bits fail and the part stops working.

This is a MUCH better deal then regular non-ECC RAM where if it fails you don't know except for odd system hangs and crashes. This is why you paid the big bucks for the RAM in the Mac Pro. So be happy the ECC stuff is woring for you and glad that you have a one year warranty.

Way back in the dark ages before the PC era almost ALL computers used ECC RAM. It was very uncommon not to have it. Then came the small desktop machines and the race to be cheap so they dropped the extra redundant bits and saved a few percent on the price. Back them we'd look at the reports of failing RAM and schedule maintenance at some convenient time. This works well for a professional machine. Make your clients happy now, fix the RAM when you can.

If your OS is detecting the ECC errors than you most definately have bad RAM. You can try and reseat it as sometimes that will help. You can also try and call a local computer store and see if they have a RAM checker. It's a little box that will run a bunch of error rate tests to verify that it is in fact bad. Sometimes, BestBuy has one of these guys but it is not every store.

Is your machine acting odd? Can you suspend/sleep it without any issues? Those are other identifying issues of bad RAM.

I have seen this issue with all brands of RAM. In reality, there are only so many manufacturers of the actual chips themselves. One brand is comparable to the other.



The Mac Pro has ECC RAM. It is giving him good data but reporting that it had to correct an error. If no more bits fail he could continue to use this with no problems for years. But I'd go for a warranty replacement, but no hurry. The Mac Pro RAM can correct single bit errors and detect double bit errors

The RAM uses a serial data channel where the bits come in one after the other on the same connector pin. The computer reads trillions of bits of data an hour if over and over the same nine bits have a problem I doubt re-seaing the RAM will help. The problem is likely with those nine bits themselves.

Are RAM checkers smart enough to know that a bit was corrected? If not the machine will just see good data and not report a problem

PowerMike G5
Dec 26, 2006, 06:28 PM
thanks all ... i seem to have no problems so far and it is now showing status ok after a reboot ... i guess the RAM worked just as its suppose to, it did Error Correcting :)

Demon Hunter
Dec 26, 2006, 10:39 PM
So it's okay to get errors then? :confused:

trainguy77
Dec 26, 2006, 11:36 PM
Well generally you should not be getting a large number of errors. Like under 2 in a month or so is fine. However, the errors could mean that your system is running to hot, that may be what causes them. But it could also be bad ram or a chip not seated correctly. canadaRAM.......Correct me please.:D

[G5]Hydra
Dec 28, 2006, 11:54 AM
People are correct ECC memory will correct errors and that this is a good thing. The general rule for ECC is one error on one stick over the course of a few months is not a show stopper but multiple errors on one stick in a short period of time indicates a problem. Either the stick is not seated properly and or getting too hot which will also cause errors or it is defective. One of ECC's functions is to warn of and correct a "flipped bit" due do high energy particles from space colliding with the module. These rarely do happen but the likelihood of more than one hitting the same module in the same day is so small it probably would never happen in billions of years. Apple has an article on ECC:
http://docs.info.apple.com/article.html?artnum=86700

-Jerry C.

CanadaRAM
Dec 28, 2006, 12:03 PM
The question is whether the errors are externally generated (cosmic rays, for example, or power spikes) or generated from a failing chip on the module itself. The log doesn't supply that answer.

We are (coincidentaly) discussing with our manufacturers right now how many ECC errors can be considered normal "Background" and at what level it triggers concern.

It's not exactly like dead pixels, where a failure rate of X number of pixels is deemed to be acceptable. Theoretically, RAM should be perfect, but there are external and intermittant influences that (for the first time) we can see the effects of in the correctible errors.

In the past, a memory error like this was called "geeze, the program froze. I wonder why? Oh well, I'll restart." And unless it happened every hour, we lived with it.