Sunday, February 19, 2012

Java runtime platform : performance checkup

How does jHiccup help developers?

jHiccup helps you avoid common pitfalls in application performance characterization. Most performance reporting assumes a normal distribution of response times (single mode), yet most 'real' systems are multi-modal due to GC pauses, OS or virtualization 'hiccups', swapping, etc. jHiccup can help you find and identify the causes of these delays that affect response times.

While jHiccup does not measure application response times, application response time will inevitably include any stall times observed at the runtime platform level.

jHiccup data and Hiccup Charts can quickly identify whether application responsiveness issues are dominated by application code or by the underlying runtime platform's behavior:

                                       If application response time behavior is similar to that observed by jHiccup, observed as a correlation of dominant stall events in time and of response time %'ile distribution and magnitude, the runtime platform's behavior and occasional stalls are likely the dominant factor in application responsiveness. Improvement in runtime stall behavior through tuning or eliminating runtime stall causes will most likely improve the application response time characteristics.
                                    If application response time behavior does not correlate with jHiccup data and Hiccup Charts, the application code and/or external resources are most likely the dominant factor, and runtime tuning and improvements in the area of stalls and responsiveness are unlikely to result in improved application response behavior.

While jHiccup does not in itself attempt to identify the causes of these stalls, they strongly correlate in time with Garbage Collection events logged during the run. The smaller and more frequent events are related to young generation GC pauses, while the larger event correlates to an old generation, stop-the-world Full GC event in the logs.

General observations across various application types lead us to the following cause groupings:
Stalls at handful-of-msec level, and all the way up to the mid-tens of milliseconds can be caused by scheduling issues and momentary load sharing pressure (as demonstrated by the Idle App Hiccup Charts). Such stalls tend to be spread over time but rarely show a clean period nature.

Stalls in the 100s of msec, especially when they appear in clearly periodic patterns over time, are typically associated with "minor" or young-generation GC pause events.

Stalls in the multi-second range are typically associated with old-generation GC or Full GC pause events. While often separated by multiple minutes, they can be expected to periodically appear over long runs.

For and example, having a look in above Hiccup Charts it is clear (for example) that 99% of those responses must have taken at least 350-400msec, as the application platform as a whole was stalled for periods of that length at those percentile.
How to run?

1) Download jHiccup and unzip it
2) Read README file as per your need OR exicute as following if you want to check Hiccup in your app server

#jHiccup ~/bin/tomcat-startup
3) jHiccups will generate two logs namly "hiccup.*.hgrm" and "hiccup.*"
4) Open jHiccupPlotter
                 Click on "Developer" --> "Code" --> click on  "Macro security"
                 Select "Enable all macros"
If "Developper" tab is not being shown, click on "Office Button"--> click on "Excel option" --> select "Show Developer Tab in Ribon"

That's it!!