Confluence : Character encodings in Confluence
This page last changed on May 18, 2006 by mryall.
Where character encoding is usedThere are three places that character encoding matters to Confluence:
Problems generally arise when Confluence thinks one of the above encoding is different to what it actually is. For example, Confluence might believe the database is using ISO-8859-1 encoding, when in fact it is UTF-8 encoded. Java character encodingJava always uses the double-byte UCS-2 character encoding for all char and String data. This means that each of the encodings above defines how, at that particular point, characters are converted to and from Java's native UCS-2 format into some other format that the browser, filesystem or database might understand. So when a request comes in to Confluence, we convert it from the request encoding to UCS-2. Then we store that data into the database, converting from UCS-2 to the database's encoding. Retrieving information from the database and sending it back to the browser is the same process in the opposite direction. Problems with character encodingsIf Confluence has the wrong idea about encoding for one of the above, it manifests itself in different ways:
Configuration of character encodingsThe Confluence character encoding is a configuration setting found in Administration > General Configuration, and at runtime available in Settings.defaultEncoding. It is subsequently used in the following parts of the system:
In summary, changing the Confluence character encoding will change your HTTP request and response encoding and your Filesystem encoding as used by exports and velocity templates. The database encoding is the responsibility of your JDBC drivers. The drivers are responsible for reading and writing from the database in its native encoding and translating this data to and from Java Strings (which are UCS-2). For some drivers, such as MySQL, you must set Unicode encoding explicitly in the JDBC URL. For others, the driver is smart enough to determine the database encoding automatically. Ideally, your database itself should be in a Unicode encoding (and we recommend doing this for the simplest configuration), but that is not necessary as long as:
The filesystem encoding is mostly ignored by Confluence, except for the cases where the above configuration setting above plays a part (exports, velocity). When attachments are uploaded, they are written as a stream of bytes directly to the filesystem. It is the same when they are downloaded: the bytes from the file InputStream are written directly to the HTTP response. In some places in Confluence, we use the default filesystem encoding as determined by the JVM and stored in the file.encoding system property (it can be overridden by setting this property at startup). This encoding is used by the Java InputStreamReader and InputStreamWriter classes by default. This encoding should probably never be used; for consistent results across all filesystem access we should be using the encoding set in the General Configuration. In certain cases we explicitly hard-code the encoding used to read or write data to the filesystem. Two important examples are:
Some application servers, Tomcat for example, have an encoding setting that modifies Confluence URLs before they reach the application. This can prevent access to international pages and attachments (really anything with international characters in the URL). See configuring your Application Server URL encoding. AdviceIn general, always set all character encodings to UTF-8. That includes database, JDBC drivers, application server, filesystem and Confluence. In certain isolated cases (e.g. Microsoft Windows), it might not be possible to use a fully Unicode filesystem (that is, a default Windows install doesn't support Unicode filenames properly). If so, stick with UTF-8 for the other two and be aware that your operating system might have limitations around international attachments (pre-2.2), backup and restore of international data, etc. |
![]() |
Document generated by Confluence on Mar 22, 2007 21:00 |