My Photo

Become a Fan

DailyMile

Google Ad Skyscraper

« IBM DB2 Purescale is very cool, IBM WebSphere eXtreme Scale makes it better | Main | GitHub for WebSphere eXtreme Code samples from my blog »

December 03, 2009

Comments

Mike Burke

I'm a fan of JMX, but I haven't found an out-of-the-box way to automatically persist metrics over an extended period for long term trending, forecasting, etc.

Is there a missing piece of the puzzle out there that lets me capture selective (and aggregated) JMX stats into a persistent store (file, database, RPC, whatever)?

Kasper Nielsen

You should really make the code threadsafe. There are some serious issues and some minor ones.

A Serious one:

public long getAvgTimeNS() {
if(count.get() > 0)
{
return totalTimeNS.get() / count.get();
}
This will fail if the counter is reset between the first and the last line. Because the counter would then return 0.

A Minor one:
minTimeNS.set(Math.min(minTimeNS.get(), durationNS));

You should use compareAndSet in a for loop to make sure you don't miss any updates.


There are also some small inconsistency windows where you can get crazy readings because the reset functionality is not atomic.

Billy Newport

I did those deliberately except for the / 0 one. People complain if its synchronized given there is only one MBean sometimes. I'll upload a new version.

Billy Newport

Kasper,
Fixed, thanks for keeping me honest...

William Louth

Billy please stick to data/compute grids and leave management/monitoring to others because my friend clearly you have not got a clue what is needed to really solve performance & scalability problems in production when you recommend a technology that is poorly conceived and implemented for the task at hand (especially in the coming years).

It does not even scale to the level of metric data collection required for anything more than a petshop application.

http://opencore.jinspired.com/?page_id=129

I have yet to see JMX solve anything but the lowest of hanging fruit and even then you would probably be quicker and more accurate asking a witch doctor for advice.

JMX was designed for legacy productions from HP (OpenView) and IBM (Tivoli) and even there it fails to hit the mark offering no efficient means of collection and no standardization of measurements that could be correlated in some intelligent manner.

William

William Louth

Kasper reset functionality is a very bad practice in production. Its a developer feature. No one person in operation should be allowed to effectively delete monitoring data. Because JMX is not transactional (and most of the time no even thread safe) it just corrupts the data even more than it is already due to latency across multiple calls.

Marking is what is required.

Billy Newport

Will,
You're awesome, brought a smile to my face :) Your stuff is very cool but it's not in the JDK. Most customers with performance problems have no instrumentation at all. The first thing they do is call support because it's always a product problem until proven otherwise which while sometimes it is, usually it's something else. Any increase in the amount of instrumentation people do is a good thing and it would allow them and us to see whats going on. I'm just trying to encourage this using stuff that they already have. If they had your stuff and were using it then they can happily ignore this blog post, they are already ahead of the game.

William Louth

OK I tried showing some degree of restraint but obviously it did not work.

- JMX is sufficient for simple control operations (stop/start/restart/gc/...) when these allow the user to make a serious of transient state transitions across a number of services in a single operation. That said I am very uncomfortable with attribute write operations because of the obvious lack of persistence (change management control) and transaction support (rarely are updates localized to one attribute).

- JMX serves as a basic (and somewhat primitive) read-only management interface for generic remote client consoles as the cost of remote calls dwarfs the huge overhead in making local access calls. But at the same time its design makes the management consoles pretty much anemic (low data collection) and not terribly scalable (high latency) as attributes have to be pulled one by one. There are ways around some of the issues. Many, many ways. Actually each release of JMX seems to bring a new approach (or MBean) in futile effort to correct (mask) the original sin buried within its design.

If you are doing health status monitor or high level reporting in a well tested and pretty stable (in terms of workload & execution behavior) system then is sufficient to build those big RED & GREEN circle dashboards. Anything else (is there anything not changing these days) and you seriously need to get back to the dormitory and take your medication.

William

William Louth

I forgot to follow up on this point you raised.

"Any increase in the amount of instrumentation people do is a good thing and it would allow them and us to see whats going on."

Actually this is the crux problem. Developers rarely have enough information to make this determination. It is best left to tools that do not guess hotspots but instead instrument and dynamically enable/disable resource metering based on accurate real-time resource usage profiles.

Granted if you have not got this technology then anything is better than nothing but lets strive to reach higher than such low hang fruit.

By the way I did try hard to get Oracle, IBM and Sun behind my technology. The IBM team was just too busy doing CSI on dead JVM's.

The comments to this entry are closed.