http://news.cnet.com/8301-30685_3-10370026-264.html?tag=newsLeadStoriesArea.1
Like their study on hard drive failures and operating temperature (which generally showed that temperature was not as big a factor as generally thought in drive failure), it's great to see them contributing some interesting information on real-world hardware performance and durability from their own operations.
"We found the incidence of memory errors and the range of error rates across different DIMMs (dual in-line memory modules) to be much higher than previously reported," according the paper jointly written by Bianca Schroeder, a professor at the University of Toronto, and Google's Eduardo Pinheiro and Wolf-Dietrich Weber. "Memory errors are not rare events."
How many errors? On average, about one in three Google servers experienced a correctable memory error each year and one in a hundred an uncorrectable error, an event that typically causes a crash.
That may not sound like a high fraction, but bear these factors in mind, too: each memory module experienced an average of nearly 4,000 correctible errors per year, and unlike your PC, Google servers use error correction code (ECC) that can nip most of those problems in the bud. That means an correctable error on a Google machine likely is an uncorrectable error on your computer, said Peter Glaskowsky, an analyst at the Envisioneering Group (and member of CNET's blog network).
Like their study on hard drive failures and operating temperature (which generally showed that temperature was not as big a factor as generally thought in drive failure), it's great to see them contributing some interesting information on real-world hardware performance and durability from their own operations.