Skip to the Main Content

Note:These pages make extensive use of the latest XHTML and CSS Standards. They ought to look great in any standards-compliant modern browser. Unfortunately, they will probably look horrible in older browsers, like Netscape 4.x and IE 4.x. Moreover, many posts use MathML, which is, currently only supported in Mozilla. My best suggestion (and you will thank me when surfing an ever-increasing number of sites on the web which have been crafted to use the new standards) is to upgrade to the latest version of your browser. If that's not possible, consider moving to the Standards-compliant and open-source Mozilla browser.

June 5, 2007

Bit Flip

I got a frantic email from John Baez, Saturday night. Evidently, our MovableType installation had suddenly gone haywire, and various CGI actions were producing Perl errors, rather the desired results.

It sometimes happens that my mucking about with the software produces untoward side-effects. But that wasn’t the case here. I’d been out for the evening, and hadn’t touched anything.

Eventually, I tracked down the problem. The file, in which the Perl error occurred had changed. A single occurrence of “>” had become a “?”. Changing it back cured the problem.

Mind you:

  • The file is writable only by root.
  • The mtime of the file had not changed.
  • There were no other changes to the file, or to anything else in our MT installation.
  • There were no signs of intrusion or system compromise.

Now, “>” is the byte 00111110 and “?” is the byte “00111111”. So a single bit had changed. This was quite enough to send our MovableType installation into upheaval. As far as I can tell, it happened without the intervention of human hand.

I’ve read about cosmic rays causing bit-flips. Seemed like a rather unlikely risk. Now I’m not so sure …

Posted by distler at June 5, 2007 11:29 PM

TrackBack URL for this Entry:   https://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/1305

8 Comments & 1 Trackback

Re: Bit Flip

Many years ago, in the days of DOS, I once encountered bit flips in files as a prelude to hard drive failure.

Posted by: Georg on June 6, 2007 1:33 PM | Permalink | Reply to this

Re: Bit Flip

On modern operating systems, hard drive failures are usually presaged by ominous diagnostic messages in the system logs. There weren’t any such messages in the logs.

Posted by: Jacques Distler on June 6, 2007 2:20 PM | Permalink | PGP Sig | Reply to this

Re: Bit Flip

Perhaps it is good news then that Leopard apparently will have ZFS as a default filesystem - it automagically looks for and corrects bit rot in the background.

Posted by: Mark on June 6, 2007 7:56 PM | Permalink | Reply to this

ZFS

ZFS? Whoa. Rock on!

Seriously, thought, the rumour that ZFS will be the default file system in Leopard seems a little far-fetched, given that not even Sun is shipping an operating system bootable under ZFS.

But, hey, sometimes dreams come true.

Posted by: Jacques Distler on June 6, 2007 11:27 PM | Permalink | PGP Sig | Reply to this

Re: Bit Flip

The actual cause doesn’t really matter, but since this is about physics: for a planet housed computer memory bit flips are much more likely to be due to alpha decay in relatively nearby materials, as remarked in this ref.
Posted by: dave tweed on June 10, 2007 1:39 PM | Permalink | Reply to this

Ionizing radiation

Speaking of physics, the ionizing radiation clearly creates electrons/holes. Since the bit stored in the dram depends on the electron concentration in the capacitor well, the transitions are asymmetric. In general, an alpha-particle hitting the dram will switch “1 to 0” and not “0 to 1”.

This happens a lot, expect about 1 bit flip per month and gigabyte. Your server does have ECC memory or similar error correction, does it? ;-) Still, any given gigabyte of memory contains almost no human-readable data, so you either are extremely lucky or have have a giant error rate. If the actual problem was memory corruption, I would run memtest86 for a day or two. Good memory modules do die eventually, I just replaced some earlier this year.

The alternative would be a harddrive error. This could be checked by rebooting/remounting the partition and seeing if the bit flip persists. Probably would have messed up more that a single bit, though.

Posted by: Volker Braun on June 17, 2007 1:13 PM | Permalink | Reply to this

Another story

Bernhard Schmalhofer:

On Friday a colleague @work asked me about an error in a webapplication. There was a Perl syntax error in Sys::Hostname. As I saw no reason that anybody should mess with with Sys::Hostname, I checked the time Hostname.pm was last changed. Confusingly the last change was in 2004, apparently this was the time the server was set up. A diff with another Perl 5.8.0 installation showed a single bit change. The first space, 0x20, of Hostname.pm line 104 has turned into into a ‘(‘, 0x28. The only explanation I can imagine is that a bit has flipped in file cache.

Strange.

Most striking is the fact that both cases involved a Perl module…

(PS.: why no <cite> allowed?)

Posted by: Aristotle Pagaltzis on June 25, 2007 12:59 AM | Permalink | Reply to this

Re: Another story

PS.: why no <cite> allowed?

No good reason; just that no one’s asked before. We do allow the @cite attribute (as you can see, and even provide nice Javascript linking.

I’ll add <cite> when I get around to it.

But thanks for confirming that this phenomenon is real. (And a little frightening.)

If it really is a bit-flip in the onboard disk cache, I imagine that it would require a power-down to reset it. Which is better than having a magnetic domain flip, but still …

Posted by: Jacques Distler on June 25, 2007 1:29 AM | Permalink | PGP Sig | Reply to this
Read the post Faulty Memory
Weblog: Musings
Excerpt: Stuck bit.
Tracked: May 2, 2008 12:43 PM

Post a New Comment