gr ([info]grumpy_sysadmin) wrote in [info]unixadmin,

Hey Red Hat, screw you again!

We've had a periodic problem on a set of identical Dell PowerEdge 2550s where top bombs out with a floating point exception. I first suspected that the kernel had become out of sync with the userland (by the latter being upgraded but the former not), or even that I had installed a new kernel but not rebooted to using it and Red Hat was (foolishly) overwriting the old kernel's shared objects on disk, and then we got screwed on paging.

Nope, this is yet another (see also) known, reported bug that Red Hat simply refused to fix... and now they've discontinued the 7.x series, so I can yell and scream at them all I want without getting it fixed. At least they labeled this one RAWHIDE ("We've fixed it internally, we don't care about you people with production systems that have paid for our product.") rather than WONTFIX ("Your opinion as our customer doesn't matter at all. We're going to do it our way, and you should shut up and like it.").

I simply can't be bothered to reboot production systems because some nitwit used a float when he should have used an unsigned float, and my users wouldn't be too happy about it either. So I went and downloaded procps-2.0.7-25.src.rpm, which was released for Red Hat 8.0, did a rpm --rebuild procps-2.0.7-25.src.rpm, then did rpm -U /usr/src/redhat/RPMS/i386/procps-2.0.7-25.i386.rpm. Note that it's only through the rpmfind.net community effort that it was even possible to get this file. What the hell am I paying Red Hat for?

(Yes, I could have just built the stuff that RH's procps contains from the latest source, but I prefer to use the existing package managment system if at all possible, since, even with RPM, it lends some sanity to managing software on Unix and Linux systems.)

  • Post a new comment

    Error

    Your IP address will be recorded 

  • 10 comments

[info]fin9901

February 6 2004, 08:11:13 UTC 8 years ago

Oddly enough, I've seen top crash on FreeBSD when I've left it running overnight, though via floating point exception, not segfault.

[info]grumpy_sysadmin

February 6 2004, 08:21:58 UTC 8 years ago

I mispoke. The problem on RH was, in fact, a floating point exception. Which means that this was a signed float of some sort, not an int.

Both corrected in the above.

As for FreeBSD, you might want to check the same libraries, assuming FreeBSD's top gets its information from /proc:

# rpm -q --provides procps
libproc.so.2.0.7
procps = 2.0.7-25


(FreeBSD may not call this libproc.so, necessarily, but ldd `which top` should clear up what it does call it.)

[info]fin9901

February 6 2004, 12:03:30 UTC 8 years ago

The box that I've noticed it happening on is running FreeBSD 3.2 (yes, I know that's ancient); I don't recall it happening on the FreeBSD 4.4 box I have. Eventually I'll get around to reinstalling that box; I've been waiting for 5.x to become stable but I'm losing patience.

[info]pfak

February 6 2004, 10:53:15 UTC 8 years ago

Re:

I've never had top crash on any of my *BSD boxes, and I left a top session running for 211 days (until power failure yesterday night).. must be something wrong with your server.

[info]grumpy_sysadmin

February 6 2004, 11:14:08 UTC 8 years ago

The problem that happened with Red Hat could happen with any other operating system or distribution and, in fact, I have reported exactly the same error in NetBSD previously. It's pure hubris to imagine that this "can't happen" because it's BSD rather than Linux.

The problem here is in the vendor's response to the bug report, not in the existence of the bug. Bugs happen in all complicated software projects. This shouldn't be shocking.

So, back to the specific point, it's almost definitely a coding error in FreeBSD's top(1) (which may be from the ports tree, it certainly comes from pkgsrc on NetBSD) or in a shared library on which it depends. It's rather unlikely that this is a hardware problem, or other applications would exhibit similar symptoms.

[info]reddragdiva

February 6 2004, 08:38:32 UTC 8 years ago

"What the hell am I paying Red Hat for?"

Vendor certification for Oracle, etc., is the only possible reason, which is why the lowest possible price for official Red Hat keeps going up.

[info]stormgren

February 6 2004, 13:23:09 UTC 8 years ago

Re:

Yech. Given how much vendor support usually sucks, I figure I'm on my own and run whatever distro I'd like.

Red Hat is for the birds.

Hopefully with Novell's acquisition of SuSE, there might be hope.

[info]grumpy_sysadmin

February 6 2004, 14:25:37 UTC 8 years ago

Those of us who bother to justify our actions to our superiors often don't have that option.

[info]pfak

February 6 2004, 10:52:38 UTC 8 years ago

That's what you get for using Redhat :-)

[info]grumpy_sysadmin

February 6 2004, 11:14:42 UTC 8 years ago

You need to read [info]reddragdiva's comment again.
Create an Account
Forgot your login or password?
Facebook Twitter More login options
English • Español • Deutsch • Русский…