Group: rec.audio.pro

Professional audio recording and studio engineering.

Add group to favorites Add group to favorites
   indietro Back to post list     indietro Send new message to group
Search:
Pg.
2

Post Subject:

PC Motherboard Chipsets and Parts Vendors

Reply from: Mike Rivers
Date: 14 May 2008, 17:56
Re: PC Motherboard Chipsets and Parts Vendors

Soundhaspriority wrote:

> There are no additional parts on the motherboard. There are a few additional
> traces. A non ECC DIMM has 64 data lines out of either 8 or 16 chips. An ECC
> DIMM has 72 data lines out of 9 or 18 DIMMS.

I wasn't talking about more parts on the motherboard, I was talking
about more parts on the memory module. That's the part that you're
saying fails, and I say that because it has more parts, it's more likely
to fail.

I'm having a deja vu experience here because it seems that we've talked
about this before, or if not in regard to memory, to something similar
with error correction capabilities (like a CD, for example). If there's
a failure and a memory error gets corrected, that might keep the
computer running in an instance when it might crash without ECC, but
you're really just concealing the problem - unless of course there's a
pop-up message saying "Your ECC memory has just corrected an error.
Replace your memory as soon as possible."

Eventually there will be more errors than it can correct and then it
will fail like any other memory.



--
If you e-mail me and it bounces, use your secret decoder ring and reach
me here:
double-m-eleven-double-zero at yahoo -- I'm really Mike Rivers
(mriv...@d-and-d . com )

Reply from: Steve L.
Date: 14 May 2008, 18:19
Re: PC Motherboard Chipsets and Parts Vendors

Mike Rivers <mrivers@d-and-d . com > said in response to my bewilderment

> Eventually there will be more errors than it can correct and then it
> will fail like any other memory.
>

ECC will only correct single bit errors ... mulitple bit errors can be a
likely source of a crash but ECC can only pass the info on to the OS (if
it supports it .. probably need a server OS) and cannot correct them.
If cost is an issue... it's more for ECC ram .. more for a motherboard
that supports ECC ram .. and more for an operating system that will tell
you why your system crashed due to muliple bit errors. How much more?
concievable a few hundred to several hundred depending on the components.
Is it worth it? well.. i think if you back up your data regularly and
note when everything is working fine you're ok. on the other hand if
you're somewhat lackadaisy about backing up and an unrecoverable crash
ends up being devastaing it might well be worth it.

This guy has some points to ponder here:
* w w w .pcguide . com /ref/ram/errECC-c.html




Reply from: Soundhaspriority
Date: 14 May 2008, 19:00
Re: PC Motherboard Chipsets and Parts Vendors


"Mike Rivers" <mrivers@d-and-d . com > wrote in message
news:kYDWj.519$0h.239@trnddc02...
> Soundhaspriority wrote:
>
>> There are no additional parts on the motherboard. There are a few
>> additional traces. A non ECC DIMM has 64 data lines out of either 8 or 16
>> chips. An ECC DIMM has 72 data lines out of 9 or 18 DIMMS.
>
> I wasn't talking about more parts on the motherboard, I was talking about
> more parts on the memory module. That's the part that you're saying fails,
> and I say that because it has more parts, it's more likely to fail.
>
> I'm having a deja vu experience here because it seems that we've talked
> about this before, or if not in regard to memory, to something similar
> with error correction capabilities (like a CD, for example). If there's a
> failure and a memory error gets corrected, that might keep the computer
> running in an instance when it might crash without ECC, but you're really
> just concealing the problem - unless of course there's a pop-up message
> saying "Your ECC memory has just corrected an error. Replace your memory
> as soon as possible."
>
> Eventually there will be more errors than it can correct and then it will
> fail like any other memory.
>
Mike, this reasoning is not valid. We're talking soft errors, which occur
when nothing is wrong with the memory module. The causes are cosmic rays and
poorly modeled noise. The purpose is not concealment. The purpose is to
correct errors that unavoidably occur in properly operating memory modules,
errors that are caused by fundamental forces of the universe. Soft errors
are not a reason to replace ram, and all grades are equally affected.

When we say "single bit error", we mean bit per 64 bit word. The memory can
have any number of soft errors in different words, and they will all be
correctable provided there is no more than a single error per 64 bit word.

Two bit errors in the same word are detectable but not correctable. To get
around that problem, ECC motherboards have a special background CPU process
that runs in Ring 0, called scrub. Over a typical 8 hours, scrub reads every
word in RAM. If it sees an error, it writes it back out again. The chances
of two errors cropping up IN THE SAME WORD before the first error is
scrubbed out is about 1 in 10^13 hours. The age of the universe is about
1.3x10^14 hours. Thus, a computer might expect to have had 10 two bit errors
if it had been running since the beginning of time.

This is not fringe stuff. Google has around a hundred thousand blade servers
in racks that cover acres, and all of them use ECC technology. Every Avid
certified workstation has ECC ram. The only reason it isn't part of the
consumer market is that it's so hard for the consumer to understand why he
should have it.

The blade that archives this conversation has ECC ram.

Bob Morein
(310) 237-6511










Reply from: Mike Rivers
Date: 15 May 2008, 00:09
Re: PC Motherboard Chipsets and Parts Vendors

Soundhaspriority wrote:

> Mike, this reasoning is not valid. We're talking soft errors, which occur
> when nothing is wrong with the memory module.

If there's nothing wrong with the memory module, then why is there an
error? If there's an error, there's something wrong with the memory
module, if only intermittently.

> The causes are cosmic rays and
> poorly modeled noise.

I don't know what "poorly modeled noise" is, but you're not going to
convince me that cosmic rays cause computer errors no matter how many
web sites you find that say they do. Maybe in the lab, but in my house?
Inside a solid metal case? Should I be wearing a metal helmet?

> The purpose is not concealment. The purpose is to
> correct errors that unavoidably occur in properly operating memory modules,

Why would anyone sell a computer device that is allowed to make errors
and still considered to be working normally? Isn't there a better
terminology you can use?

> This is not fringe stuff. Google has around a hundred thousand blade servers
> in racks that cover acres, and all of them use ECC technology.

If I had Google's money and technology support resources, I'd do what
they tell me to do. But I'm just an occasional user and I don't work my
computers so hard that they crash. If I have soft errors, I don't know it.

> The only reason it isn't part of the
> consumer market is that it's so hard for the consumer to understand why he
> should have it.

With explanations like "soft errors but working normally" and "cosmic
rays," and most important, that non-ECC is the most common type of
memory, it's no wonder the consumer doesn't understand why he needs it.
While you're at it, why not try to convince people that they should
listen to 24-bit DVDs instead of MP3s?




--
If you e-mail me and it bounces, use your secret decoder ring and reach
me here:
double-m-eleven-double-zero at yahoo -- I'm really Mike Rivers
(mriv...@d-and-d . com )

Reply from: Soundhaspriority
Date: 15 May 2008, 00:49
Re: PC Motherboard Chipsets and Parts Vendors


"Mike Rivers" <mrivers@d-and-d . com > wrote in message
news:IqJWj.17747$%X1.9083@trnddc08...
> Soundhaspriority wrote:
>
>> Mike, this reasoning is not valid. We're talking soft errors, which occur
>> when nothing is wrong with the memory module.
>
> If there's nothing wrong with the memory module, then why is there an
> error? If there's an error, there's something wrong with the memory
> module, if only intermittently.
>
> > The causes are cosmic rays and
>> poorly modeled noise.
>
> I don't know what "poorly modeled noise" is, but you're not going to
> convince me that cosmic rays cause computer errors no matter how many web
> sites you find that say they do. Maybe in the lab, but in my house? Inside
> a solid metal case? Should I be wearing a metal helmet?
>
Oh boy! Yes, Mike, the answer is yes, cosmic rays are the primary culprit.

Back when I was starting out, one could put a UV ROM chip under a
microscope, and see the features. In a few more years, the geometry shrank
out of sight. Today, the features on a modern RAM chip are about 85nm --
that's 85 billionths of a meter, or 85 millionths of a millimeter. A cesium
atom is about .5 nm in size. So the entire memory cell is only about 170
atoms wide. A DRAM cell is just a capacitor, and an ionizing particle, such
as a cosmic ray, creates a conductive path for the charge to leak away
before the next refresh cycle. The neutron is another source of this
problem. The neutron flux at the surface of the Earth is substantial.

Cosmic rays pass through our bodies constantly. DNA has repair mechanisms,
but older individuals do have more damaged DNA than younger. But normal
oxidative stress causes about 100X the damage to organisms as cosmic rays.
And a metal helmet would not stop them anyway :)

Physics has many weird facts. Here's another: The neutrino is a massless
particle that usually passes through the entire Earth without stopping.
We're all hit by billions of neutrinos each day, with no effect, because
they interact so weakly with matter. A nuclear power plant loses 25% of it's
potential power through neutrino loss from the reactor core, which passes
intensely through everyone living nearby, but with no effect.

Bob Morein
(310) 237-6511



Reply from: Mike Rivers
Date: 15 May 2008, 02:26
Re: PC Motherboard Chipsets and Parts Vendors

Soundhaspriority wrote:

> Oh boy! Yes, Mike, the answer is yes, cosmic rays are the primary culprit.

When I was in high school, the Wakefield Rocket Society (read about us
in Time magazine) had "the Cosmic Ray Project" where we built some
sensors. mounted them on the roof of the school, and counted cosmic
rays. If those sensors are still there 50 years later I'll bet they're
still waiting for a cosmic ray. Well, we did get a few, but not many.


--
If you e-mail me and it bounces, use your secret decoder ring and reach
me here:
double-m-eleven-double-zero at yahoo -- I'm really Mike Rivers
(mriv...@d-and-d . com )

Reply from: Scott Dorsey
Date: 15 May 2008, 02:37
Re: PC Motherboard Chipsets and Parts Vendors

Mike Rivers <mrivers@d-and-d . com > wrote:
>Soundhaspriority wrote:
>> Oh boy! Yes, Mike, the answer is yes, cosmic rays are the primary culprit.
>
>When I was in high school, the Wakefield Rocket Society (read about us
>in Time magazine) had "the Cosmic Ray Project" where we built some
>sensors. mounted them on the roof of the school, and counted cosmic
>rays. If those sensors are still there 50 years later I'll bet they're
>still waiting for a cosmic ray. Well, we did get a few, but not many.

Actually, there are a bunch of them. They are very high energy, though,
so they tend to go through things without doing much, but the are sadly
the main culprit in fogging of film stored at low temperature. If you
develop a fine grain film you can even see the track as it went through
the emulsion.
--scott
--
"C'est un Nagra. C'est suisse, et tres, tres precis."

Reply from: Soundhaspriority
Date: 15 May 2008, 03:02
Re: PC Motherboard Chipsets and Parts Vendors


"Scott Dorsey" <kludge@panix . com > wrote in message
news:g0g0kp$qgb$1@panix2.panix . com ...
> Mike Rivers <mrivers@d-and-d . com > wrote:
>>Soundhaspriority wrote:
>>> Oh boy! Yes, Mike, the answer is yes, cosmic rays are the primary
>>> culprit.
>>
>>When I was in high school, the Wakefield Rocket Society (read about us
>>in Time magazine) had "the Cosmic Ray Project" where we built some
>>sensors. mounted them on the roof of the school, and counted cosmic
>>rays. If those sensors are still there 50 years later I'll bet they're
>>still waiting for a cosmic ray. Well, we did get a few, but not many.
>
> Actually, there are a bunch of them. They are very high energy, though,
> so they tend to go through things without doing much, but the are sadly
> the main culprit in fogging of film stored at low temperature. If you
> develop a fine grain film you can even see the track as it went through
> the emulsion.
> --scott

Yep, I've got a bunch of Kodak T-500 short ends that probably aren't too
good any more.

Bob Morein
(310) 237-6511



Reply from: Richard Crowley
Date: 15 May 2008, 01:50
Re: PC Motherboard Chipsets and Parts Vendors

"Mike Rivers" wrote ...
> Soundhaspriority wrote:
>> Mike, this reasoning is not valid. We're talking soft errors, which occur
>> when nothing is wrong with the memory module.
>
> If there's nothing wrong with the memory module, then why is there an
> error? If there's an error, there's something wrong with the memory
> module, if only intermittently.

Mike, that's like saying that you have defective paint on your
car if it gets dinged by one of those baseball-size hailstones.
There's nothing wrong with the paint. You can't protect against
giant hailstones unless you want to drive an armored tank.

Cosmic "rays" (actually particles) come from outer space and
pass through everything. They use detectors hundreds of metres
underground to measure them. They can cause random "noise"
in most any chip (not just memory). There is no known way to
block them, so you can't "shield" your computer.
* en.wikipedia.org/wiki/Cosmic_ray

Fortunately, the likelyhood of a cosmic ray flipping a bit in your
computer memory is too unlikely to make ECC worth the expense
unless you're running something high-stakes (life-safety systems,
servers for hundreds or thousands of users, etc.)

"The error rate in today's consumer-level memory is so low
so that for most everyday applications, adding ECC is pure
overkill. For standard DDR2 memory, the error rate is
something like 100 soft errors over 1 billion device hours.
If there are 16 memory devices or chips on a given module,
that translates to one soft error every 30 years. Even if you
only have two such DIMMs in a system, that's still less than
one error for more than the lifetime of the system as a whole."
* searchwincomputing.techtarget . com /tip/0,289483,sid68_gci1251848,00.html
Thanks to "JulienBH" for this reference. There are more but
I'm not motivated to find them.

> > The causes are cosmic rays and poorly modeled noise.
>
> I don't know what "poorly modeled noise" is,

Perhaps Bob will explain.

> but you're not going to convince me that cosmic rays cause computer errors
> no matter how many web sites you find that say they do. Maybe in the lab,
> but in my house? Inside a solid metal case? Should I be wearing a metal
> helmet?

They CAN cause memory (and even CPU) errors, and
nobody disputes that. The issue is whether it happens
often enough to warrant spending extra $$$ on ECC to
detect and/or correct it. I'd pay $5-10 extra for ECC,
but not $50-100. Its just not worth it.

Case in point: Virtually none of us run computers with ECC
and it is unlikely that any of us have experienced a significant
problem from a cosmic "ray" causing an error in our computers.
There are dozens of different hazards to yourself, your property,
your computer, your media, your data, etc. that are much more
likely (and worth spending $$ to protect against or mitigate)
than cosmic rays causing soft errors.

> Why would anyone sell a computer device that is allowed to make errors and
> still considered to be working normally?
> Isn't there a better terminology you can use?

There are cosmic "rays" passing through your roof and maybe
even through your brain right now as you read this. Fortunately
our bodies, and most of what we make and use are not affected
by this natural phenomenon. Things as microscopic and sensitive
as integrated circuits ARE susceptible, but both scientific data
and actual real-world experience suggest that it doesn't make
the top 10 list of things to worry about happening to your PC.

Cosmic "rays" cause a tiny fraction of the noise we hear in
any analog circuit, but it is a ~10th order effect compared to
the much more common things like Johnson-Nyquist noise
(i.e. "thermal noise").

>> This is not fringe stuff. Google has around a hundred thousand blade
>> servers in racks that cover acres, and all of them use ECC technology.
>
> If I had Google's money and technology support resources, I'd do what they
> tell me to do. But I'm just an occasional user and I don't work my
> computers so hard that they crash. If I have soft errors, I don't know it.

Most servers use ECC because it is prudent (and not a significant
cost differential in "heavy-iron" computing). But most end-user
computers ("workstations") do not use ECC because the cost vs.
benefit ratio is significantly negative.

"To alleviate this problem, Intel has proposed a cosmic ray
detector which could be integrated into future high-density
microprocessors, allowing the processor to repeat the last
command following a cosmic ray event."
* en.wikipedia.org/wiki/Cosmic_ray

"The risk from cosmic rays may not be thought of as a big
problem on a single computer with a single chip, as there is
the potential for error only perhaps every several years.
But Mr Hannah explained that on a supercomputer with
10,000 chips, there was the potential for 10 or 20 faults
a week.....
"He said that discussions are now under way within Intel
about how to build such a detector and see how it works.
"But he admitted that it will be hard to say when such a
device may become a practical reality. "
* news.bbc.co.uk/2/hi/technology/7335322.stm

>> The only reason it isn't part of the consumer market is that it's so hard
>> for the consumer to understand why he should have it.

You'd think that if it were that critical more people would
perceive it. OTOH..., well, draw your own conclusions.

> With explanations like "soft errors but working normally" and "cosmic
> rays," and most important, that non-ECC is the most common type of memory,
> it's no wonder the consumer doesn't understand why he needs it. While
> you're at it, why not try to convince people that they should listen to
> 24-bit DVDs instead of MP3s?

An excellent point. If SER was any kind of significant issue
for the average computer user, you can be sure that the
people who market them would jump at the chance to prove
to customers that the extra $$ was worth it. Instead, even
people who make ECC admit that it isn't really necessary
in your average PC.



Reply from: Soundhaspriority
Date: 15 May 2008, 02:09
Re: PC Motherboard Chipsets and Parts Vendors


"Richard Crowley" <rcrowley@xp7rt . net > wrote in message
news:691c6aF2v01nrU1@mid.individual . net ...
[snip]
>
> "The error rate in today's consumer-level memory is so low
> so that for most everyday applications, adding ECC is pure
> overkill. For standard DDR2 memory, the error rate is
> something like 100 soft errors over 1 billion device hours.
> If there are 16 memory devices or chips on a given module,
> that translates to one soft error every 30 years. Even if you
> only have two such DIMMs in a system, that's still less than
> one error for more than the lifetime of the system as a whole."
> * searchwincomputing.techtarget . com /tip/0,289483,sid68_gci1251848,00.html
> Thanks to "JulienBH" for this reference. There are more but
> I'm not motivated to find them.
>

[snip]

Note: The above statement is an individual's opinion, not accepted fact.
From Samsung's website,
* w w w .samsung . com /global/business/semiconductor/products/dram/Products_DDR2SDRAM.html
the error rate for FBDIMM, which is a much higher quality device, is given
as less than 1 error per 100 years. FBDIMM is an ECC corrected buffered
device that is much more reliable than the unbuffered memory we are
discussing.


"With system reliability so essential in servers as they take on an
ever-greater workload, the FB DIMM architecture helps deliver exceptional
reliability. Most notable is a silent data error rate of one per 100 years
or less, enabled by a robust CRC scheme that protects both commands and
data. Also boosting reliability are features such as transient bit-error
detection and retry and "bit-lane fail-over correction." This enables the
server board to shut down a bad data path on the fly. Optional bit widths
and CRC coverage will be applicable to a wide range of server applications.
"

I disupte Rich Crowley's citation.

Bob Morein
(310) 237-6511



Reply from: Richard Crowley
Date: 15 May 2008, 02:17
Re: PC Motherboard Chipsets and Parts Vendors

"Soundhaspriority" wrote ...
> I disupte Rich Crowley's citation.

And a pleasant good evening to you, too, Bob.



Reply from: Soundhaspriority
Date: 15 May 2008, 02:23
Re: PC Motherboard Chipsets and Parts Vendors


"Richard Crowley" <rcrowley@xp7rt . net > wrote in message
news:691dovF2u62lvU1@mid.individual . net ...
> "Soundhaspriority" wrote ...
>> I disupte Rich Crowley's citation.
>
> And a pleasant good evening to you, too, Bob.
>
I doff my hat to you, Rich.

Bob Morein
(310) 237-6511



Reply from: Ralph Barone
Date: 15 May 2008, 06:03
Re: PC Motherboard Chipsets and Parts Vendors

In article <Tf2dnQC6Bpv4H7bVnZ2dnUVZ_tTinZ2d@giganews . com >,
"Soundhaspriority" <nowhere@nowhere . com > wrote:

> "Richard Crowley" <rcrowley@xp7rt . net > wrote in message
> news:691dovF2u62lvU1@mid.individual . net ...
> > "Soundhaspriority" wrote ...
> >> I disupte Rich Crowley's citation.
> >
> > And a pleasant good evening to you, too, Bob.
> >
> I doff my hat to you, Rich.
>
> Bob Morein
> (310) 237-6511

Hey, you two! Knock it off! This is a family newsgroup...

Reply from: Mike Rivers
Date: 15 May 2008, 02:44
Re: PC Motherboard Chipsets and Parts Vendors

Richard Crowley wrote:

> Case in point: Virtually none of us run computers with ECC
> and it is unlikely that any of us have experienced a significant
> problem from a cosmic "ray" causing an error in our computers.

Maybe I'm getting my memories confused, but didn't all memory (back in
the "640 K should be enough for anyone" days) have error checking and
correction? There was always a parity bit, and I assume that's what it
was for. It wasn't until some time later, in the **MM days, when they
started leaving the parity bit (a chip) off to reduce cost.

> There are cosmic "rays" passing through your roof and maybe
> even through your brain right now as you read this.

No wonder I have a backache and I can't get an erection. ;)

> Most servers use ECC because it is prudent (and not a significant
> cost differential in "heavy-iron" computing). But most end-user
> computers ("workstations") do not use ECC because the cost vs.
> benefit ratio is significantly negative.

That's what I was figuring. I wouldn't want a stray cosmic ray to tell
an airplane I'm flying in to go down when it's supposed to go up, but I
just naturally expect a click now and then in an audio file. I usually
blame it on a bit of saliva in the singer's mouth or a guitar pick
hitting the pickup.

> If SER was any kind of significant issue
> for the average computer user, you can be sure that the
> people who market them would jump at the chance to prove
> to customers that the extra $$ was worth it.

I don't think they'd have to. They'd just stop making non-ECC memory and
the price of computers would go up, along with the reliability. Nothing
to brag about, just "computers work better now." But apparently the
problem isn't big enough to worry consumers, so the manufacturers are
giving them cheaper equipment. I don't think this is Muntz engineering
(remove parts until it's bad enough so that it doesn't work, then put
the last part back) at work here, I think it's just that components are
good enough for who they're for, and if you think you're better than
average, there are better components available - just like microphones
or D/A converters.



--
If you e-mail me and it bounces, use your secret decoder ring and reach
me here:
double-m-eleven-double-zero at yahoo -- I'm really Mike Rivers
(mriv...@d-and-d . com )

Reply from: Soundhaspriority
Date: 15 May 2008, 02:58
Re: PC Motherboard Chipsets and Parts Vendors


"Mike Rivers" <mrivers@d-and-d . com > wrote in message
news:nHLWj.8680$Uz2.2453@trnddc06...
> Richard Crowley wrote:
>
>> Case in point: Virtually none of us run computers with ECC
>> and it is unlikely that any of us have experienced a significant
>> problem from a cosmic "ray" causing an error in our computers.
>
> Maybe I'm getting my memories confused, but didn't all memory (back in the
> "640 K should be enough for anyone" days) have error checking and
> correction? There was always a parity bit, and I assume that's what it was
> for. It wasn't until some time later, in the **MM days, when they started
> leaving the parity bit (a chip) off to reduce cost.
>
It had a parity bit, but it didn't have correction. It wasn't economically
feasible until the width of a memory module hit 64 bits. We sure could have
used it -- those were the bad old days ! ;)

Bob Morein
(310) 237-6511




Pg.
2



Login:
  Username:    Password: 
 
   Lost Password? click here!
Thread:
   Julien BH
   Steve L.
    Steve L.
   Steve L.
     Steve L.
      Mike Rivers
       Steve L.
        Mike Rivers
         Soundhaspriority
          Mike Rivers
           Steve L.
           Soundhaspriority
            Mike Rivers
             Soundhaspriority
              Mike Rivers
               Scott Dorsey
                Soundhaspriority
             Richard Crowley
              Soundhaspriority
               Richard Crowley
                Soundhaspriority
                 Ralph Barone
              Mike Rivers
               Soundhaspriority
              Ralph Barone
               Soundhaspriority
                Romeo Rondeau
                 Soundhaspriority
               Scott Dorsey
              Arny Krueger
               Soundhaspriority
                Arny Krueger
                 Soundhaspriority
                  Arny Krueger
                   Soundhaspriority
                    Mats Peterson
                    Arny Krueger
             Soundhaspriority
              Richard Crowley
               Soundhaspriority
       Soundhaspriority
       Soundhaspriority
   Steve L.
    Julien BH
      Richard Crowley
       Soundhaspriority
        Richard Crowley
         Soundhaspriority
          Richard Crowley
           Soundhaspriority
            Richard Crowley
             Soundhaspriority
              Richard Crowley
               Soundhaspriority
         Soundhaspriority
          Arny Krueger
           Soundhaspriority