Java Platform, Infrastructure Edition, part 2
Posted by rbpasker on April 17, 2008
Well, after getting a much deserved woodshed-ing from Hal Hildebrand for misunderstanding his post on what he’s been struggling with, I have been thinking and chatting with people about what a a “system programming” or “infrastructure” edition for Java might contain.
One of the characteristics that makes application infrastructure so unique is that it is a container for other people’s random code. Java generally does a pretty good job of dealing with loadable code by providing such features as class loaders, a code security model, a component security model, and a threading architecture.
There are, however, still a number of robustness issues with Java as it exists today:
- hot code relaoding – this is an age-old robustness problem, especially for operational issues like rolling upgrades. and I think the Zero Turnaround guys may have a splendid general purpose solution in their Java Rebel product.
- runaway thread healing – These are threads that go into an endless loop or permanent I/O block. We used to have the ability to set an ExecuteThread timeout in WLS, whereby a watchdog timer would kill any thread that didn’t complete within a configurable time period. But then Sun deprecated Thread.stop(), and suggested instead cooperative thread death using a state variable. This abdication of responsibility for robustness from the VM to user code is similar to the cooperative transaction manager timeout, about which my colleague Pete Holditch says as
There is no easy answer – there isn’t really a facility in J2SE or J2EE as they stand today to allow a thread to be safely and asynchronously terminated.
I’d like to see a permanent solution to this problem, even if it means implementing transactional memory in the JVM.
- memory quotas – Another great way to test an application server’s robustness is to leak memory. Providing a quota system that limits (hopefully, heuristically) the ability for a component to allocate memory would prevent bad code from killing the whole server with an OOME.
- deadlock management – Before you go hitting the “comment” button, note that I went through Distributed Lock Manager hell in VMS 25 years ago, so I know the pitfalls here. Nevertheless, Azul has done some great stuff (pdf) in this area, and I think its ripe for attention.
So rather than just complaining, here are some real-life problems that Java Platform, Infrastructure Edition could solve.