This page last changed on Jan 22, 2006 by cmiller.

One major concern is Confluence's ability to withstand a Slashdot, and someone told us that Atlassian had basically said that Confluence could not handle the load of such an event/attack.

Ideally I would want to put a Squid cache directly infront of Conflunce, set the default policy to cache content of normal pages for ~5 minutes (at least) and then pass-through more of the dynamic pages (like the editor & such).

This is, in fact, the case. We don't have any deployed Confluence sites that have the requirement of being Slashdot-proof, but this is probably one of those chicken-and-egg things.

The problem is not one of simple scaleability. We're currently working on "Confluence Massive", a clusterable Confluence that will scale to handle whatever load you feel like throwing at it. But if your aim is to protect the server against sudden, transient loads, throwing a cluster at the problem that will then spend 99% of its time not being utilised is probably a waste. Thus, the best solution is to have some kind of caching reverse-proxy that will divert load away from Confluence itself.

The main problem with the reverse-proxy solution is that every Confluence page is built dynamically for whichever user is currently accessing it. This affects obvious stuff like the "You are logged in as username" notice, less obvious stuff like the "edit" and "attachments" links that appear or disappear based on whether the user has permission to perform the action on the other end of the link, and even less obvious stuff like wiki-links to spaces the user can't see, or in-page macros that output their content based on the user's identity.

To run Confluence behind a caching reverse-proxy, you'd need one of:

  1. A proxy that understood the user's identity, or
  2. A Confluence site that removed all the personalised content for cacheable pages.

If you had (1), you could tell the proxy to cache content only for anonymous users (since all anon content is the same, and to survive a slashdotting you only really have to worry about the sudden influx of non-logged-in users). That said, (1) is quite tricky, as it relies on the existence of some SSO mechanism that both Confluence and Squid can be hooked into. If such a mechanism existed, though, it'd be a really neat solution.

In the absence of SSO, you've got (2), which involves.

  • Theme Confluence so that the 'view page' 'view blog post' and 'view mail' pages contain no personalised content: no profile link or user identity, and all links to other functions available whether the user has permission to access them or not.
  • Ensure that all wiki pages on the server are meant to be visible to anonymous users
  • Disable (or avoid the use of) macros that deliver different content based on user identity
  • Introduce an interceptor into Confluence that would provide If-Modified-Since/Last-Modified conditional get support for wiki pages
  • Configure Confluence so the site root URL points to a page, rather than the dashboard.
  • Configure Squid to cache the 'view page' URLs (/display/* /pages/viewpage.action /pages/viewblogpost.action)

This is assuming that only the site root or a regular wiki page would ever be the victim of a direct slashdotting, but I figure this is a reasonable enough assumption to make.

With conditional get supported, you could have Squid configured to query the server to see if a page has changed, and just put in some kind of sensible defaults for the maximum time to cache any page (5 minutes or so would be fine, since pages could contain dynamic content), and the minimum gap between if-modified queries (15 seconds would easily prevent the server from being overloaded, while making sure that in regular use you wouldn't get many situations where you edited a page, but couldn't see your own changes).

Let's put this problem in perspective. Confluence does plenty of caching internally and is quite capable of handling heavy page loads:

[jturner@atlassian01 jturner]$ ab -c 100 -n 1000 'http://confluence.atlassian.com/display/DOC/Running+Confluence+Behind+a+Caching+Proxy+Server'
This is ApacheBench, Version 2.0.40-dev <$Revision: 1.121.2.1 $> apache-2.0
Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright (c) 1998-2002 The Apache Software Foundation, http://www.apache.org/

Benchmarking confluence.atlassian.com (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Finished 1000 requests


Server Software:        Orion/2.0.2
Server Hostname:        confluence.atlassian.com
Server Port:            80

Document Path:          /display/DOC/Running+Confluence+Behind+a+Caching+Proxy+Server
Document Length:        17771 bytes

Concurrency Level:      100
Time taken for tests:   134.464524 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      18132000 bytes
HTML transferred:       17771000 bytes
Requests per second:    7.44 [#/sec] (mean)
Time per request:       13446.452 [ms] (mean)
Time per request:       134.465 [ms] (mean, across all concurrent requests)
Transfer rate:          131.69 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:  2841 12922 9676.2  10331   79456
Waiting:     2841 12918 9676.4  10328   79456
Total:       2841 12922 9676.2  10331   79456

Percentage of the requests served within a certain time (ms)
  50%  10331
  66%  14326
  75%  16876
  80%  18574
  90%  25698
  95%  33259
  98%  40656
  99%  44256
 100%  79456 (longest request)
Posted by jeff at Feb 03, 2006 20:25
Document generated by Confluence on Mar 22, 2007 20:59