This page last changed on Dec 03, 2009 by jlargman.
This document relates broadly to memory management with Sun's Hotspot JVM. These are recommendations based on Support's successful experiences with customers and their large Confluence instances.

Summary

  • Set the Eden space up to 30-50% of the overall heap: -XX:NewSize=<up to half of your Xmx value>
  • Use a parallel collector: -XX:+UseParallelOldGC (make sure this is Old GC)
  • Disable explicit garbage collection triggered from distributed remote clients (-XX:+DisableExplicitGC)
  • set the minimum and maximum Xmx and Xms values as the same (eg. -Xms1024m -Xmx1024m)
  • Turn on GC logging (add the flags -verbose:gc -Xloggc:<full-path-to-log> -XX:+PrintGCTimeStamps -XX:+PrintGCDetails) and submit the logs in a support ticket
  • Use Java 1.6
  • Read below if heap > 2G

See Configuring System Properties for how to add these properties to your environment.

Background

Performance problems in Confluence generally manifest themselves in either:

  • frequent or infrequent periods of viciously sluggish responsiveness, which requires a manual restart, or, the application eventually and almost inexplicably recovers
  • some event or action triggering a non-recoverable memory debt, which in turn envelops into an application-fatal death spiral (Eg. overhead GC collection limit reached, or Out-Of-Memory).
  • generally consistent poor overall performance across all Confluence actions

There are a wealth of simple tips and tricks that can be applied to Confluence, that can have a significantly tangible benefit to the long-term stability, performance and responsiveness of the application.

On this page:

Why this happens

Confluence is basically a gel. Multiple applications, data-types, social networks and business requirements can be efficiently amalgamated together, leading to more effective collaboration. The real beauty of Confluence, however, is it's agility to mold itself into your organizations' DNA - your existing business and cultural processes, rather than the other way around - your organization having to adapt to how the software product works.

The flip side of this flexibility is having many competing demands placed on Confluence by it's users. Historically, this is an extraordinarily broad and deep set of functions, that really, practically can't be predicted for individual use cases.

The best mechanism to protect the installation is to place Confluence on a foundation where it is fundamentally more resilient and able to react and cope with competing user requirements.

Appreciate how Confluence and the JAVA JVM use memory

The java memory model is naive. Compared to a unix process, which has four intensive decades of development built into time-slicing, inter-process communication and intelligent deadlock avoidance, the JAVA thread model really only has 10 years at best under it's belt. As it is also an interpreted language, particular idiosyncrasies of the chosen platform Confluence is running can also influence how the JRE reacts. As a result it is sometimes necessary to tune the jvm parameters to give it a "hint" about how it should behave.

There are circumstances whereby the JAVA jvm will take the 'mediocre' option in respect to resource contention and allocation, and, 'struggle' along with ofttimes highly impractical goals. For example, The JRE will be quite happy to perform at 5 or 10% of optimum capacity if it means overall application stability and integrity can be ensured. This often translates into periods of extreme sluggishness, which effectively means that the application isn't stable, and isn't integral (as it cannot be accessed).

This is mainly because JAVA shouldn't make assumptions on what kind of runtime behavior an application needs, but, it's plain to see that the charter is to assume 'business-as-usual' for a wide range of scenarios and really only react in the case of 'dire' circumstances.

Memory is contiguous

The Java memory model requires that memory be allocated in a contiguous block. This is because the heap has a number of side data structures which are indexed by a scaled offset (ie n*512 bytes) from the start of the heap. For example, updates to references on objects within the heap are tracked in these "side" data structures.

Consider the differences between:

  1. Xms (the allocated portion of memory)
  2. Xmx (the reserved portion of memory)

Allocated memory is fully backed, memory mapped physical allocation to the application. That application now owns that segment of memory.

Reserved memory (the difference between Xms and Xmx) is memory which is reserved for use, but not physically mapped (or backed) by memory. This means that, for example, in the 4G address space of a 32bit system, the reserved memory segment can be used by other applications, but, because JAVA requires contiguous memory, if the reserved memory requested is occupied, the OS must swap that memory out of the reserved space either to another non-used segment, or, more painfully, it must swap to disk.

Permanent Generation memory is also contiguous. The net effect is even if the system has vast quantities of cumulative free memory, Confluence demands contiguous blocks, and consequently undesirable swapping may occur if segments of requested size do not exist. See Causes of OutOfMemoryErrors for more details.

Please be sure to position Confluence within a server environment that can successfully complete competing requirements (operating system, contiguous memory, other applications, swap, and Confluence itself).

Figure out which (default) collector implementation your vendor is using

Default JVM Vendor implementations are subtly different, but in production can differ enormously.

SUN by default splits the heap into three spaces

  1. Eden (Nursery, or Scavenger)
  2. Tenured (Old)
  3. Permanent Generation (classes & library dependencies)

Objects are central to the operation of Confluence. When a request is received, the Java runtime will create new objects to fulfill the request in the Eden Space. If, after some time, those objects are still required, they may be moved to the Tenured (Old) space. But, typically, the overwhelming majority of objects created die young, within the Eden Space. These are objects like method local references within a while or for loop, or Iterators for scanning through Collections or Sets.

But in IBM J9 the default policy is for a single, contiguous space - one large heap. The net effect is that for large Websphere environments, garbage collection can be terribly inefficient - and culpable to suffer outages during peak periods.

For larger instances with performance issues, it is recommended to tune Confluence such that there is a large Eden space, at up to 50% the overall size of the heap.

-XX:NewSize=XXXm where XXX is the size in megabytes, is the command line parameter. -XmnXXXm can also be used interchangeably. Ie. -XX:NewSize=700m, -Xmn700m

By setting a larger NewSize, the net effect is that the JRE will spend less time garbage collecting, clearing dead memory references, compacting and copying memory between spaces, and more time doing actual work.

Use the Parallel Garbage Collector

Confluence out of the box, and Sun Java as default, uses the serial garbage collector on the Full Tenured heap. The Eden space is collected in parallel, but the Tenured is not. This means that at a time of load, if a full collection event occurs, since the event is a 'stop-the-world' serial event, all application threads other than the garbage collector thread are taken off the CPU. This can have severe consequences if requests continue to accrue during these 'outage' periods. As a rough guide, for every gigabyte of memory allocated, allow a full second (exclusive) to collect.

If we parallelize the collector on a multi-core/multi-cpu architecture instance, we not only reduce the total time of collection (down from whole seconds to fractions of a second) but we also improve the resiliency of the JRE in being able to recover from high-demand occasions.

Additionally, Sun provide a CMS, Concurrent Mark-Sweep Collector (-XX:+UseConcMarkSweepGC), which is more optimized for higher-throughput, server-grade instances. As a general rule the Parallel Collector (-XX:+UseParallelOldGC) is suitable for most installations, if you are keen to extract the best performance available, then the CMS collector is an option.

Disable remote (distributed) garbage collection by JAVA clients

Many clients integrate 3rd party, or their own custom applications to interrogate, or add content to Confluence via it's RPC interface. The Distributed Remote Garbage Collector in the client uses RMI to trigger a remote GC event in the server, Confluence. Unfortunately, as of this writing, a System.gc() call via this mechanism triggers a full, serial collection of the entire Confluence heap (as it needs to remove references to remote client objects in it's own deterministic object graph). This is a deficiency in the configuration and/or implementation of the JVM, and, has the potential to impact severely if the remote client is poorly written, or operating within a constricted JVM.

This can be disabled by using the flag -XX:+DisableExplicitGC at startup.

Virtual Machines are Evil

Vmware Virtual Machines, whilst being extremely convenient & fantastic, also obviously being a very strong growth segment of the industry, cause particular problems for JAVA applications because it's very easy for host operating system resource constraints (ie, temporarily insolvent memory availability, or I/O swapping), to cascade into the JAVA VM and manifest itself in extremely unusual, frustrating and seemingly illogical problems. We already document some disk I/O metrics with VMware images, and, although we now officially support the use of virtual instances, we absolutely do not recommend them unless maintained correctly.

This is not to say that vmware instances cannot be used, but, they must be used with due care, proper maintenance and configuration. Besides, if you are reading this document because of poor performance, the first port of call should be to remove any virtualization - emulation will never beat the real thing, and at the least introduces black box variability into the system.

Use Java 1.6

Java 1.6 is generally regarded via public discussion to have an approximate 20% performance improvement over 1.5. Indeed, our own internal testing revealed this statistic to be credible. 1.6 is compatible for all supported versions of Confluence, and we strongly recommend that installations not using 1.6 migrate.

Use -server flag

The hotspot server JVM has specific code-path optimizations which yield an approximate 10% gain over the client version. Most installations should already have this selected by default, but it is still wise to force it with -server, especially on some Windows machines.

If using 64bit JRE for larger heaps, use CompressedOops

For every JDK release Sun also build a "Performance" branch in which specifically optimized performance features can be enabled; It is available on the Sun Java SE page after a brief survey. These builds are certified production grade.

Some blogs have suggested a 25% performance gain AND reduction in Heap size when using this parameter. The use and function of the -XX:+UseCompressedOops parameter is more deeply discussed on Sun's Official Wiki (Co-incidentally, using Confluence!)

Use NUMA if on SPARC, Opteron or recent Intel (Nehalem or Tukwila onwards)

-XX:+UseNUMA flag enables the java heap to take advantage of Non-Uniform-Memory-Architectures. JAVA will place data structures relevant to the thread which it owns / operates on, in memory locations closest to that particular processor. Depending on the environment, gains can be substantial. Intel market NUMA as Quick Path Interconnectâ„¢.

Use 32bit JRE if Heap < 2G

Using a 64bit JRE when the heap is under 2G will hurt substantially in heap and performance, as nearly every object, reference, primitive, class and variable will cost twice as much to address.

A 64bit JRE/JDK is only recommended if heaps greater than 2G are required. If so, use CompressedOops.

JVM coredumps can be instigated by memory pressures

If your instance of Confluence is throwing Java coredumps, it's known that memory pressure and space / generation sizings can influence the frequency and/or occurrence of this phenomena.

If your Tomcat process completely disappears and the logs record similar to:

#
# An unexpected error has been detected by HotSpot Virtual Machine:
#
#  SIGSEGV (0xb) at pc=0xfe9bb960, pid=20929, tid=17
#
# Java VM: Java HotSpot(TM) Server VM (1.5.0_01-b08 mixed mode)
# Problematic frame:
# V  [libjvm.so+0x1bb960]
#

---------------  T H R E A D  ---------------

Current thread (0x01a770e0):  JavaThread "JiraQuartzScheduler_Worker-1" [_thread_in_vm, id=17]

siginfo:si_signo=11, si_errno=0, si_code=1, si_addr=0x00000000

Registers:
 O0=0xf5999882 O1=0xf5999882 O2=0x00000000 O3=0x00000000
 O4=0x00000000 O5=0x00000001 O6=0xc24ff0b0 O7=0x00008000
 G1=0xfe9bb80c G2=0xf5999a48 G3=0x0a67677d G4=0xf5999882
 G5=0xc24ff380 G6=0x00000000 G7=0xfdbc3800 Y=0x00000000
 PC=0xfe9bb960 nPC=0xfe9bb964

We recommend a bug report be submitted to the JAVA Vendor, but in Support's experience, resource contention issues and core dumps are tightly coupled.

Instigate useful Monitoring techniques

At all times, the best performance tuning recommendations are based on current, detailed metrics. This data is easily available and configurable, and helps us at Atlassian, tremendously, when diagnosing reported performance regressions.

  1. enable JMX monitoring
  2. enable Confluence Access logging
  3. enable Garbage Collection Logging
  4. Take Thread dumps at the time of regression. If you can't get into Confluence, you can take one externally.
  5. Jmap can take a memory dump in real time without impacting the application. Syntax: jmap -heap:format=b <process_id>

Great tools available include:

  • The very excellent VisualVM, Documentation.
  • Thread Dump Analyzer - great all round thread debugging tool, great for identifying deadlocks.
  • Samurai, an excellent alternative thread analysis tool, great for iterative dumps over a period of time.
  • GC Viewer - getting a bit long in the tooth, but is a good mainstay for GC analysis.
  • GChisto - A new GC analysis tool written by members of the Sun Garbage Collection team.

Documentation:

  • Sun's White Paper on Garbage Collection in JAVA 6.
  • Sun's state-of-the-art JavaOne 2009 session on garbage collection (registration required).
  • IBM stack: Java 5 GC basics for WebSphere Application Server.
  • An Excellent IBM document covering native memory, thread stacks, and how these influence memory constricted systems. Highly recommended for additional reading.
  • The complete list of JRE 6 options
Atlassian recommends at the very least to get VisualVM up and running (You need JMX), and to add Access and Garbage Collection logging.

Tuning the frequency of full collections

The JVM will generally only collect on the full heap when it has no other alternative, because of the relative size of the Tenured heap (it is typically larger than Eden), and the natural probability of objects within tenured not being eligible for collection. (ie. they are still alive).

Some installations can trundle along, only ever collecting in Eden space. As time goes on, some object will survive the initial Eden object slaughterhouse and be promoted to Tenured. At some point, it will be dereferenced and no longer reachable by the deterministic, directed object graph. However, the occupied memory will still be held in limbo as "dead" memory until a collection occurs in the Tenured space to clear and compact the space.

It is not uncommon for moderately sized Confluence installations to reclaim as much as 50% of the current heap size on a full collection; This is because full collections occur so infrequently. By reducing the occupancy fraction heap trigger, this means that more memory will be available at any time, meaning less swapping / object collections will occur when needed, ie, during the busy hour.

Atlassian would classify frequency tuning on collections as an advanced topic for further experimentation, and is provided for informational purposes only. Unfortunately, it's impractical for Atlassian to support this degree of 'ergonomics'.

Performance Tuning works

Atlassian has a number of high profile and some extremely high demanding, mission-critical clients who have successfully, through some trial and error, applied these recommendations to production instances and have very positively improved their Instances. For any more information, or guidance, please lodge a support case with the relevant information.

Document generated by Confluence on Dec 10, 2009 18:41