Re: PC Motherboard Chipsets and Parts Vendors
"Arny Krueger" <arnyk@hotpop . com > wrote in message
news:ef-dnWpMfbKdoLHVnZ2dnUVZ_gydnZ2d@comcast . com ...
> "Richard Crowley" <rcrowley@xp7rt . net > wrote in message
> news:691c6aF2v01nrU1@mid.individual . net
>
>> An excellent point. If SER was any kind of significant
>> issue for the average computer user, you can be sure that the
>> people who market them would jump at the chance to prove
>> to customers that the extra $$ was worth it. Instead,
>> even people who make ECC admit that it isn't really
>> necessary in your average PC.
>
> Here's a real-world study:
>
> * w w w .ece.rochester.edu/~xinli/usenix07/
>
> 212 servers with about 4 GB RAM each were monitored for about 90 days.
>
> There were 2 errors.
>
> This is a small enough number of errors that it seems to be statistically
> questionable. IOW the same experiment might have found 1 error, or 3
> errors or more or less, when repeated. However 10 errors or more seems to
> be highly unlikely
>
> We might then think that 50 servers running for a year would give similar
> results - 10 errors or less.
>
> A 4 GB PC might thus run for 5 years (1 to 2 times its useful life) and
> not develop any errors the whole time it was turned on.
That is reasonable. However, there is another issue, which has to do with
noise modeling. Because of the extremely large number of current patterns in
a motherboard, it's impossible to simulate them all. The same is true with
RAM chips, which is why memory tests work so poorly -- they fail to account
for pattern sensitivity.
The EMI landscape changes constantly. In complexity, it's much like the CPU
test problem -- it's impossible to completely test a CPU for faults. The
board cannot be run through all the patterns, because there are too many.
Server motherboards used buffered ram for two reasons: it allows more ram on
a bus, but it also increases noise margins. Desktop machines don't use
buffered ram. When Microsoft made their "complaint" about hardware, they
fingered ECC, but the ram makers fingered the motherboard makers in turn.
They suggested a "reread" facility, reading the ram again, which, they
stated, would knock out some of the errors caused by the EMI landscape. But
ECC checks the ram bus as well, so unless the noise induced error goes
beyond one bit, it will reduce noise induced error as well.
The paper you cite is extremely interesting. However, I would not directly
transfer the results, except loosely, as you have done, to desktop boards.
Server boards are designed for reliability at the expense of performance, a
choice I happen to prefer for all my computers, because the performance hit
is not large.
Bob Morein
(310) 237-6511