Blog

Why is My Java Application Freezing Under Heavy I/O Load?

The Question

Recently, a customer asked us:

Why would heavy disk IO cause the Tungsten Manager and not MySQL to be starved of resources?

For example, we saw the following in the Manager log file tmsvc.log:

2019/06/03 00:50:30 | Pinging the JVM took 29 seconds to respond.
2019/06/03 00:50:30 | Pinging the JVM took 25 seconds to respond.
2019/06/03 00:50:30 | Pinging the JVM took 21 seconds to respond.
2019/06/03 00:50:30 | Pinging the JVM took 16 seconds to respond.
2019/06/03 00:50:30 | Pinging the JVM took 12 seconds to respond.
2019/06/03 00:50:30 | Pinging the JVM took 8 seconds to respond.

The Answer

Why a Java application might be slow or freezing

The answer is that if a filesystem is busy being written to by another process, the background I/O will cause the Java JVM garbage collection (GC) to pause.

This problem is not specific to Continuent Tungsten products.

The following article from LinkedIn engineering explains the issue very well (and far better than I could - well done, and thank you):

https://engineering.linkedin.com/blog/2016/02/eliminating-large-jvm-gc-pauses-caused-by-background-io-traffic

Below is a quote from the above article (without permission, thank you):

Latency-sensitive Java applications require small JVM GC pauses. However, the JVM can be blocked for substantial time periods when disk IO is heavy. These are the factors involved:

  1. JVM GC needs to log GC activities by issuing write() system calls;
  2. Such write() calls can be blocked due to background disk IO;
  3. GC logging is on the JVM pausing path, hence the time taken by write() calls contribute to JVM STW pauses.

The Solution

So what may be done to alleviate the problem?

You have options like:

  • Tune the GC log location to use a separate disk to cut down on i/o conflicts as per the article above
  • Move the backups or NFS-intensive jobs to another node.
  • Unmount any NFS volumes and use rsync to an admin host responsible for NFS writes (i.e. move the mount to an external host)

Again, I quote from the LinkedIn engineering article above (without permission, thank you again):

One solution is to put GC log files on tmpfs (i.e., -Xloggc:/tmpfs/gc.log). Since tmpfs does not have disk file backup, writing to tmpfs files does not incur disk activities, hence is not blocked by disk IO. There are two problem with this approach: (1) the GC log file will be lost after system crashes; and (2) it consumes physical memory. A remedy to this is to periodically backup the log file to persistent storage to reduce the amount of the loss.

Another approach is to put GC log files on SSD (Solid-State Drives), which typically has much better IO performance. Depending on the IO load, SSD can be adopted as a dedicated drive for GC logging, or shared with other IO loads. However, the cost of SSD needs to be taken into consideration.

Cost-wise, rather than using SSD, a more cost-effective approach is to put GC log file on a dedicated HDD. With only the IO activity being the GC logging, the dedicated HDD likely can meet the low-pause JVM performance goal.

Summary

The Wrap-Up

In this blog post we discussed why Java applications freeze or are slow under heavy I/O load and what may be done about it.

To learn about Continuent solutions in general, check out https://www.continuent.com/solutions

The Library

Please read the docs!

For more information about Tungsten clusters, please visit https://docs.continuent.com

Tungsten Clustering is the most flexible, performant global database layer available today - use it underlying your SaaS offering as a strong base upon which to grow your worldwide business!

For more information, please visit https://www.continuent.com/solutions

Want to learn more or run a POC? Contact us

About the Author

Eric M. Stone
COO

Eric is a veteran of fast-paced, large-scale enterprise environments with 35 years of Information Technology experience. With a focus on HA/DR, from building data centers and trading floors to world-wide deployments, Eric has architected, coded, deployed and administered systems for a wide variety of disparate customers, from Fortune 500 financial institutions to SMB’s.

Comments

Well written article, in this first technology dependent life it’s really feel frustrating if application start to process slow. Thanks for explaining it well.Thanks and regardsACL IT Academy – IT Training Institute Kolkata

Add new comment