[phxjug-java] Help Wanted: Squashing JVM on Linux Bug

Steve Jovanovic plug-discuss@lists.plug.phoenix.az.us
Tue, 10 Jun 2003 01:06:07 -0500


Hi Huang and Chris,

Thanks very much for your suggestions and thoughts!

We're running RedHat 7.2. I checked with our system administrator, and
we're definitely not having a problem with file descriptor limits. I
continue to see things like this in the process table:

tangent:~/.ssh$ ps aux | grep java
skribe   21348  0.0  0.5 224256 5204 ?       SN   Jun07   0:00
/home/skribe/java
skribe   21351  0.0  0.5 224256 5204 ?       SN   Jun07   0:00
/home/skribe/java
skribe   21352  0.0  0.0     0    0 ?        ZN   Jun07   0:00 [java
<defunct>]
skribe   21353  0.0  0.0     0    0 ?        ZN   Jun07   0:00 [java
<defunct>]
skribe   21354  0.0  0.0     0    0 ?        ZN   Jun07   0:00 [java
<defunct>]
skribe   21355  0.0  0.0     0    0 ?        ZN   Jun07   0:07 [java
<defunct>]
skribe   21356  0.0  0.0     0    0 ?        ZN   Jun07   0:00 [java
<defunct>]
skribe   21357  0.0  0.0     0    0 ?        ZN   Jun07   0:00 [java
<defunct>]
skribe   21358  0.0  0.0     0    0 ?        ZN   Jun07   0:01 [java
<defunct>]
skribe   21367  0.0  0.0     0    0 ?        ZN   Jun07   0:00 [java
<defunct>]
skribe   21368  0.0  0.0     0    0 ?        ZN   Jun07   0:00 [java
<defunct>]
skribe   21369  0.0  0.0     0    0 ?        ZN   Jun07   0:03 [java
<defunct>]
skribe   17657  0.0  0.5 224256 5204 ?       SN   Jun08   0:00
/home/skribe/java
skribe   17730  0.0  0.5 224256 5204 ?       SN   Jun08   0:00
/home/skribe/java
skribe   20572  0.0  0.5 224984 5704 ?       S    Jun08   0:00
/usr/local/java/b
skribe   20589  0.0  0.5 224984 5704 ?       S    Jun08   0:00
/usr/local/java/b
skribe   20590  0.0  0.5 224984 5704 ?       S    Jun08   0:00
/usr/local/java/b
skribe   20591  0.0  0.5 224984 5704 ?       S    Jun08   0:00
/usr/local/java/b
skribe   20592  0.0  0.5 224984 5704 ?       S    Jun08   0:00
/usr/local/java/b
skribe   20596  0.0  0.5 224984 5704 ?       S    Jun08   0:04
/usr/local/java/b
skribe   20597  0.0  0.5 224984 5704 ?       S    Jun08   0:00
/usr/local/java/b
skribe   20600  0.0  0.5 224984 5704 ?       S    Jun08   0:00
/usr/local/java/b
skribe   20601  0.0  0.5 224984 5704 ?       S    Jun08   0:00
/usr/local/java/b
skribe   20706  0.0  0.5 224984 5704 ?       S    Jun08   0:00
/usr/local/java/b
skribe   22637  0.0  0.0  1764  628 pts/121  S    22:45   0:00 grep java

Chris, we've tried several different JVM's, all with the same result.
It's possible that it's bad JNI code in Orion 2.0, and it's also
possible that there's a problem with the libs on the particular server
we're using, but so far we haven't been able to isolate it. 

You're right that ideally we should test on a similarly configured
machine, but unfortunately, this particular box is the one that we're
going to be running on in production. Actually, it's not really
unfortunate, but interesting. I'm really, intensely curious about what,
exactly, the problem is, and the process for tracking it down and
solving it. I hope that whatever we come up with will be helpful to
others, and I'll share whatever we come up with when we've solved the
problem.

If any other ideas strike you, please let me know!

Thanks again for your help!

Steve

PS Chris, <laugh> you've helped more than you know!!
 
Steve Jovanovic
Director of Engineering
Noumenaut Software
(262) 632-7755
 

-----Original Message-----
From: plug-discuss-admin@lists.plug.phoenix.az.us
[mailto:plug-discuss-admin@lists.plug.phoenix.az.us] On Behalf Of Huang
Haitao-G17843
Sent: Wednesday, June 04, 2003 11:11 AM
To: 'Steve Jovanovic'; java@phxjug.org;
plug-discuss@lists.plug.phoenix.az.us
Subject: RE: [phxjug-java] Help Wanted: Squashing JVM on Linux Bug

What distribution of Linux? 
If it a desktop distribution not server distribution, it could be it
runs out of file scriptor limit. check ulimit and
/etc/security/limits.conf.
Haitao Huang
Motorola 

----

This probably won't be much help, but just in case, ...

You are getting a segmentation violation (signal 11) which means you are

attempting to access protected memory.  Since you cannot do this in 
Java, it is either:  1) a bug in the JVM,  2) buggy JNI code, or 3) a 
bad configuration of shared libraries on your machine.

The best thing you can do is attempt to narrow down the problem as much 
as possible.  Also make sure you can reproduce the problem on another 
Linux box that is not identically configured and is presumably in a 
stable configuration.

I would guess that it is not a bug in the JVM only because I have found 
the Linux 1.4.1 JVM to be very stable, but if you can find no other 
explanation, you should send it to Sun.  This won't help you much
though.

Good luck,
Chris