Confluence 3.5 : Garbage Collector Performance Issues
This page last changed on Feb 15, 2011 by jlargman.
Summary
See Configuring System Properties for how to add these properties to your environment. BackgroundPerformance problems in Confluence, and in rarer circumstances for JIRA, generally manifest themselves in either:
There are a wealth of simple tips and tricks that can be applied to Confluence, that can have a significantly tangible benefit to the long-term stability, performance and responsiveness of the application. On this page:
Why Bad Things HappenConfluence is basically a gel. Multiple applications, data-types, social networks and business requirements can be efficiently amalgamated together, leading to more effective collaboration. The real beauty of Confluence, however, is its agility to mold itself into your organizations' DNA - your existing business and cultural processes, rather than the other way around - your organization having to adapt to how the software product works. The flip side of this flexibility is having many competing demands placed on Confluence by its users. Historically, this is an extraordinarily broad and deep set of functions, that really, practically can't be predicted for individual use cases. The best mechanism to protect the installation is to place Confluence on a foundation where it is fundamentally more resilient and able to react and cope with competing user requirements. Appreciate how Confluence and the JAVA JVM use memoryThe Java memory model is naive. Compared to a unix process, which has four intensive decades of development built into time-slicing, inter-process communication and intelligent deadlock avoidance, the Java thread model really only has 10 years at best under its belt. As it is also an interpreted language, particular idiosyncrasies of the chosen platform Confluence is running can also influence how the JRE reacts. As a result it is sometimes necessary to tune the jvm parameters to give it a "hint" about how it should behave. There are circumstances whereby the Java JVM will take a mediocre option in respect to resource contention and allocation and struggle along with ofttimes highly impractical goals. For example, The JRE will be quite happy to perform at 5 or 10% of optimum capacity if it means overall application stability and integrity can be ensured. This often translates into periods of extreme sluggishness, which effectively means that the application isn't stable, and isn't integral (as it cannot be accessed). This is mainly because Java shouldn't make assumptions on what kind of runtime behavior an application needs, but it's plain to see that the charter is to assume 'business-as-usual' for a wide range of scenarios and really only react in the case of dire circumstances. Memory is contiguousThe Java memory model requires that memory be allocated in a contiguous block. This is because the heap has a number of side data structures which are indexed by a scaled offset (ie n*512 bytes) from the start of the heap. For example, updates to references on objects within the heap are tracked in these "side" data structures. Consider the differences between:
Allocated memory is fully backed, memory mapped physical allocation to the application. That application now owns that segment of memory. Reserved memory (the difference between Xms and Xmx) is memory which is reserved for use, but not physically mapped (or backed) by memory. This means that, for example, in the 4G address space of a 32bit system, the reserved memory segment can be used by other applications, but, because Java requires contiguous memory, if the reserved memory requested is occupied the OS must swap that memory out of the reserved space either to another non-used segment, or, more painfully, it must swap to disk. Permanent Generation memory is also contiguous. The net effect is even if the system has vast quantities of cumulative free memory, Confluence demands contiguous blocks, and consequently undesirable swapping may occur if segments of requested size do not exist. See Causes of OutOfMemoryErrors for more details. Please be sure to position Confluence within a server environment that can successfully complete competing requirements (operating system, contiguous memory, other applications, swap, and Confluence itself). Figure out which (default) collector implementation your vendor is usingDefault JVM Vendor implementations are subtly different, but in production can differ enormously. SUN by default splits the heap into three spaces
Objects are central to the operation of Confluence. When a request is received, the Java runtime will create new objects to fulfill the request in the Eden Space. If, after some time, those objects are still required, they may be moved to the Tenured (Old) space. But, typically, the overwhelming majority of objects created die young, within the Eden Space. These are objects like method local references within a while or for loop, or Iterators for scanning through Collections or Sets. But in IBM J9 the default policy is for a single, contiguous space - one large heap. The net effect is that for large Websphere environments, garbage collection can be terribly inefficient - and capable of suffering outages during peak periods.
-XX:NewSize=XXXm where XXX is the size in megabytes, is the command line parameter. -XmnXXXm can also be used interchangeably. Ie. -XX:NewSize=700m, -Xmn700m By setting a larger NewSize, the net effect is that the JRE will spend less time garbage collecting, clearing dead memory references, compacting and copying memory between spaces, and more time doing actual work. Use the Parallel Garbage CollectorConfluence out of the box, and Sun Java as default, uses the serial garbage collector on the Full Tenured heap. The Eden space is collected in parallel, but the Tenured is not. This means that at a time of load if a full collection event occurs, since the event is a 'stop-the-world' serial event then all application threads other than the garbage collector thread are taken off the CPU. This can have severe consequences if requests continue to accrue during these 'outage' periods. As a rough guide, for every gigabyte of memory allocated allow a full second (exclusive) to collect. If we parallelize the collector on a multi-core/multi-cpu architecture instance, we not only reduce the total time of collection (down from whole seconds to fractions of a second) but we also improve the resiliency of the JRE in being able to recover from high-demand occasions. Additionally, Sun provide a CMS, Concurrent Mark-Sweep Collector (-XX:+UseConcMarkSweepGC), which is optimized for higher-throughput, server-grade instances. As a general rule, the Parallel Collector (-XX:+UseParallelOldGC) is the right choice for JIRA or Confluence installations, unless otherwise advised by support. Restrict ability of Tomcat to 'cache' incoming requestsQuite often the fatal blow is swung by the 'backlog' of accumulated web requests whilst some critical resource (say the index) is held hostage by a temporary, expensive job. Even if the instance is busy garbage collecting due to load, Tomcat will still trigger new http requests and cache internally, as well as the operating system beneath which is also buffering incoming requests in the socket for Tomcat to pick up the next time it gets the CPU. <Connector port="8080" protocol="HTTP/1.1" maxHttpHeaderSize="8192" maxThreads="150" minSpareThreads="25" maxSpareThreads="75" useBodyEncodingForURI="true" enableLookups="false" redirectPort="8443" acceptCount="100" connectionTimeout="20000" disableUploadTimeout="true"/> Here the Tomcat Connector is configured for 150 "maxThreads" with an "acceptCount" of 100. This means up to 150 threads will awaken to accept (but importantly not to complete) web requests during performance outages, and 100 will be cached in a queue for further processing when threads are available. That's 250 threads, many of which can be quite expensive in and of themselves. Java will attempt to juggle all these threads concurrently and become extremely inefficient at doing so, exasperating the garbage collection performance issue. Resolution: reduce the number of maxThreads and acceptCount to something slightly higher than normal 'busy-hour' demands. Disable remote (distributed) garbage collection by Java clientsMany clients integrate third-party or their own custom applications to interrogate, or add content to Confluence via its RPC interface. The Distributed Remote Garbage Collector in the client uses RMI to trigger a remote GC event in the Confluence server. Unfortunately, as of this writing, a System.gc() call via this mechanism triggers a full, serial collection of the entire Confluence heap (as it needs to remove references to remote client objects in its own deterministic object graph). This is a deficiency in the configuration and/or implementation of the JVM. It has the potential to cause severe impact if the remote client is poorly written, or operating within a constricted JVM. This can be disabled by using the flag -XX:+DisableExplicitGC at startup. Virtual Machines are EvilVmware Virtual Machines, whilst being extremely convenient and fantastic, also cause particular problems for Java applications because it's very easy for host operating system resource constraints such as temporarily insolvent memory availability, or I/O swapping, to cascade into the Java VM and manifest as extremely unusual, frustrating and seemingly illogical problems. We already document some disk I/O metrics with VMware images. Although we now officially support the use of virtual instances we absolutely do not recommend them unless maintained correctly. This is not to say that vmware instances cannot be used, but, they must be used with due care, proper maintenance and configuration. Besides, if you are reading this document because of poor performance, the first action should be to remove any virtualization. Emulation will never beat the real thing and always introduces more black box variability into the system. Use Java 1.6Java 1.6 is generally regarded via public discussion to have an approximate 20% performance improvement over 1.5. Our own internal testing revealed this statistic to be credible. 1.6 is compatible for all supported versions of Confluence, and we strongly recommend that installations not using 1.6 should migrate. Use -server flagThe hotspot server JVM has specific code-path optimizations which yield an approximate 10% gain over the client version. Most installations should already have this selected by default, but it is still wise to force it with -server, especially on some Windows machines. If using 64bit JRE for larger heaps, use CompressedOopsFor every JDK release Sun also build a "Performance" branch in which specifically optimized performance features can be enabled; It is available on the Sun Java SE page after a brief survey. These builds are certified production grade. Some blogs have suggested a 25% performance gain and a reduction in heap size when using this parameter. The use and function of the -XX:+UseCompressedOops parameter is more deeply discussed on Sun's Official Wiki (which itself uses Confluence!) Use NUMA if on SPARC, Opteron or recent Intel (Nehalem or Tukwila onwards)-XX:+UseNUMA flag enables the Java heap to take advantage of Non-Uniform-Memory-Architectures. JAVA will place data structures relevant to the thread which it owns / operates on, in memory locations closest to that particular processor. Depending on the environment, gains can be substantial. Intel market NUMA as Quick Path Interconnectâ„¢. Use 32bit JRE if Heap < 2GBUsing a 64bit JRE when the heap is under 2GB will cause substantial degradation in heap size and performance. This is because nearly every object, reference, primitive, class and variable will use twice as much memory to be addressed. A 64bit JRE/JDK is only recommended if heaps greater than 2GB are required. If so, use CompressedOops. JVM core dumps can be instigated by memory pressuresIf your instance of Confluence is throwing Java core dumps, it's known that memory pressure and space/generation sizings can influence the frequency and occurrence of this phenomena. If your Tomcat process completely disappears and the logs record similar to: # # An unexpected error has been detected by HotSpot Virtual Machine: # # SIGSEGV (0xb) at pc=0xfe9bb960, pid=20929, tid=17 # # Java VM: Java HotSpot(TM) Server VM (1.5.0_01-b08 mixed mode) # Problematic frame: # V [libjvm.so+0x1bb960] # --------------- T H R E A D --------------- Current thread (0x01a770e0): JavaThread "JiraQuartzScheduler_Worker-1" [_thread_in_vm, id=17] siginfo:si_signo=11, si_errno=0, si_code=1, si_addr=0x00000000 Registers: O0=0xf5999882 O1=0xf5999882 O2=0x00000000 O3=0x00000000 O4=0x00000000 O5=0x00000001 O6=0xc24ff0b0 O7=0x00008000 G1=0xfe9bb80c G2=0xf5999a48 G3=0x0a67677d G4=0xf5999882 G5=0xc24ff380 G6=0x00000000 G7=0xfdbc3800 Y=0x00000000 PC=0xfe9bb960 nPC=0xfe9bb964 then you should upgrade the JVM. See SIGSEGV Segmentation Fault JVM Crash. Artificial Windows memory limitOn Windows, the maximum heap allocatable to the Tomcat 32bit wrapper process is around 1400MB. If the instance is allocated too close to this limit, chronic garbage collection is likely to result, often producing JAVA core dumps similar to: # # A fatal error has been detected by the Java Runtime Environment: # # java.lang.OutOfMemoryError: requested 8388608 bytes for GrET in C:\BUILD_AREA\jdk6_18\hotspot\src\share\vm\utilities\growableArray.cpp. Out of swap space? # # Internal Error (allocation.inline.hpp:39), pid=11572, tid=12284 # Error: GrET in C:\BUILD_AREA\jdk6_18\hotspot\src\share\vm\utilities\growableArray.cpp # # JRE version: 6.0_18-b07 # Java VM: Java HotSpot(TM) Server VM (16.0-b13 mixed mode windows-x86 ) # If you would like to submit a bug report, please visit: # http://java.sun.com/webapps/bugreport/crash.jsp # --------------- T H R E A D --------------- Current thread (0x002af800): GCTaskThread [stack: 0x00000000,0x00000000] [id=12284] or, # # A fatal error has been detected by the Java Runtime Environment: # # java.lang.OutOfMemoryError: requested 123384 bytes for Chunk::new. Out of swap space? # # Internal Error (allocation.cpp:215), pid=10076, tid=4584 # Error: Chunk::new # # JRE version: 6.0_18-b07 # Java VM: Java HotSpot(TM) Server VM (16.0-b13 mixed mode windows-x86 ) # If you would like to submit a bug report, please visit: # http://java.sun.com/webapps/bugreport/crash.jsp # --------------- T H R E A D --------------- Current thread (0x6ca4d000): JavaThread "CompilerThread1" daemon [_thread_in_native, id=4584, stack(0x6cd10000,0x6cd60000)] Workarounds include:
Instigate useful monitoring techniquesAt all times the best performance tuning recommendations are based on current, detailed metrics. This data is easily available and configurable and helps us tremendously at Atlassian when diagnosing reported performance regressions.
Great tools available include:
Documentation:
Tuning the frequency of full collectionsThe JVM will generally only collect on the full heap when it has no other alternative, because of the relative size of the Tenured heap (it is typically larger than Eden), and the natural probability of objects within tenured not being eligible for collection, i.e. they are still alive. Some installations can trundle along, only ever collecting in Eden space. As time goes on, some object will survive the initial Eden object collection and be promoted to Tenured. At some point, it will be dereferenced and no longer reachable by the deterministic, directed object graph. However, the occupied memory will still be held in limbo as "dead" memory until a collection occurs in the Tenured space to clear and compact the space. It is not uncommon for moderately sized Confluence installations to reclaim as much as 50% of the current heap size on a full collection; This is because full collections occur so infrequently. By reducing the occupancy fraction heap trigger, this means that more memory will be available at any time, meaning that fewer swapping/object collections will occur during the busy hour. Atlassian would classify frequency tuning on collections as an advanced topic for further experimentation, and is provided for informational purposes only. Unfortunately, it's impractical for Atlassian to support these kinds of changes in general. Performance tuning worksAtlassian has a number of high profile and some extremely high demanding, mission-critical clients who have successfully, usually through trial and error, applied these recommendations to production instances and have significantly improved their instances. For more information, please file a support case at support.atlassian.com. |
![]() |
Document generated by Confluence on Mar 16, 2011 18:32 |